Nicolas Ross | 25 May 18:20
Picon

rgmanager is jamed

I am in the process of upgrading one of our cluster from RHEL 6.1 to 
6.2. It's an 8-node cluster.

I started with one node. Stop all cluster resources, cman, rgmanager et 
al. yum update, reboot, move to next. The first one did ok.

On the second one, rgmanager started, but doesn't seem to connect to 
other nodes. I found this in dmesg :

INFO: task rgmanager:2901 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager     D 0000000000000000     0  2901   2900 0x00000080
  ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
  ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
  ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
Call Trace:
  [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
  [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
  [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
  [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
  [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
  [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
  [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
  [<ffffffff81176918>] vfs_write+0xb8/0x1a0
  [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
  [<ffffffff81177321>] sys_write+0x51/0x90
  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced 
the node, same outcome.
(Continue reading)

黃曉偉 | 24 May 06:22
Picon

Is it possible to use quorum for CTDB to prevent split-brain and removing lockfile in the cluster file system

Hello list,

We know that CTDB uses lockfile in the cluster file system to prevent split-brain.
It is a really good design when all nodes in the cluster can mount the cluster file system (e.g. GPFS/GFS/GlusterFS) and CTDB can work happily in this assumption.
However, when split-brain happens, the disconnected private network violates this assumption usually.
For example, we have four nodes (A, B, C, D) in the cluster and GlusterFS is the beckend.
GlusterFS and CTDB on all nodes communicate to each other via private network and CTDB manages the public network.
If node A is disconnected in the private network, there will be group (A) and group (B,C,D) in our cluster.
The election of recovery master will be triggered after the disconnected determination of CTDB, i.e. the CTDB elects a new recovery master for each group after 26 (KeepaliveInterval*KeepaliveLimits+1 by default) seconds.
Then node A will be the recovery master of group (A) and some node (e.g. B) will be the recovery master of group (B,C,D).
Now, A and B will try to lock the lockfile but GlusterFS also communicates to each other via private network.
A big problem arises since the lockfile can be locked or not depends on the lock implementation and disconnected determination of GlusterFS (or other cluster file system). In my knowledge, GlusterFS will determine some node is disconnected after 42 seconds and release its lock. In this configuration, node A and B will ban themselves and the newly elected recovery master will ban itslef. It's a really bad thing and we can not treat the cluster file system as a blackbox using the lockfile design.

Hence, I have an idea about the opportunity to build CTDB with split-brain prevention without lockfile.
Using quorum concepts to ban a node might be an option and I do a little modification of the CTDB source code.
The modification checks whether there are more than (nodemap->num)/2 connected nodes in main_loop of server/ctdb_recoverd.c.
If not, ban the node itslef and logs an error "Node %u in the group without quorum".

In server/ctdb_recoverd.c:
static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec, TALLOC_CTX *mem_ctx)
...
        /* count how many active nodes there are */
        rec->num_active    = 0;
        rec->num_connected = 0;
        for (i=0; i<nodemap->num; i++) {
                if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {
                        rec->num_active++;
                }
                if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {
                        rec->num_connected++;
                }
        }

+       if (rec->num_connected < ((nodemap->num)/2+1)){
+               DEBUG(DEBUG_ERR, ("Node %u in the group without quorum\n", pnn));
+               ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);
+       }

This modification seems to provide a split-brain prevention without lockfile in my tests(more than 3 nodes).
Does this modification cause any side-effect or is that a stupid design?
Please kindly answer me and I appreciate to receive new inputs from smart people like you guys.

Thanks,
Az

Randy Zagar | 16 May 19:18
Picon
Favicon

Re: Linux-cluster Digest, Vol 97, Issue 5

Also, it looks like the resource manager tries to disable the IP address when it's a child of the nfsclient resource.  Is that going to be a problem when I have 16 NFS exports hosted on a single IP?

-RZ

On 05/16/2012 11:00 AM, fdinitto <at> redhat.com wrote:
On 05/15/2012 07:33 PM, Randy Zagar wrote:
> <resources> > <ip address="192.168.1.1" monitor_link="1"/> > <ip address="192.168.1.2" monitor_link="1"/> > <ip address="192.168.1.3" monitor_link="1"/> > <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> > <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> > <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> > <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/> > </resources>
For the <fs resources you want nfslock="1" option too.
> <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate"> > <ip ref="192.168.1.1"> > <fs __independent_subtree="1" ref="volume01"> > <nfsexport name="nfs-volume01"> > <nfsclient name=" " ref="local-subnet"/> > </nfsexport> > </fs> > </ip>
For all services you need to change the order. <fs.. <nfsexport.. <nfsclient.. <ip.. </nfsclient.. </nfsexport.. </fs This solves different issues at startup, relocation and recovery Also note that there is known limitation in nfsd (both rhel5/6) that could cause some problems in some conditions in your current configuration. A permanent fix is being worked on atm. Without extreme details, you might have 2 of those services running on the same node and attempting to relocate one of them can fail because the fs cannot be unmounted. This is due to nfsd holding a lock (at kernel level) to the FS. Changing config to the suggested one, mask the problem pretty well, but more testing for a real fix is in progress. Fabio

-- Randy Zagar Sr. Unix Systems Administrator E-mail: zagar <at> arlut.utexas.edu Applied Research Laboratories Phone: 512 835-3131 Univ. of Texas at Austin
Attachment (smime.p7s): application/pkcs7-signature, 8 KiB
Randy Zagar | 16 May 19:02
Picon
Favicon

Re: Linux-cluster Digest, Vol 97, Issue 5

Are you sure that nfslock="1" is a valid option for "<fs ...>"?

There doesn't appear to be a way to add that through LUCI, which means I'll have to make and propagate those changes manually.  I used to do this in EL5
/sbin/ccs_tool update /etc/cluster/cluster.conf
but it looks like it's handled differently now.

How?

-RZ

On 05/16/2012 11:00 AM, fdinitto <at> redhat.com wrote:
On 05/15/2012 07:33 PM, Randy Zagar wrote:
> <resources> > <ip address="192.168.1.1" monitor_link="1"/> > <ip address="192.168.1.2" monitor_link="1"/> > <ip address="192.168.1.3" monitor_link="1"/> > <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> > <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> > <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> > <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/> > </resources>
For the <fs resources you want nfslock="1" option too.
> <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate"> > <ip ref="192.168.1.1"> > <fs __independent_subtree="1" ref="volume01"> > <nfsexport name="nfs-volume01"> > <nfsclient name=" " ref="local-subnet"/> > </nfsexport> > </fs> > </ip>
For all services you need to change the order. <fs.. <nfsexport.. <nfsclient.. <ip.. </nfsclient.. </nfsexport.. </fs This solves different issues at startup, relocation and recovery Also note that there is known limitation in nfsd (both rhel5/6) that could cause some problems in some conditions in your current configuration. A permanent fix is being worked on atm. Without extreme details, you might have 2 of those services running on the same node and attempting to relocate one of them can fail because the fs cannot be unmounted. This is due to nfsd holding a lock (at kernel level) to the FS. Changing config to the suggested one, mask the problem pretty well, but more testing for a real fix is in progress. Fabio

-- Randy Zagar Sr. Unix Systems Administrator E-mail: zagar <at> arlut.utexas.edu Applied Research Laboratories Phone: 512 835-3131 Univ. of Texas at Austin
Attachment (smime.p7s): application/pkcs7-signature, 8 KiB
Randy Zagar | 15 May 19:33
Picon
Favicon

Re: RHEL/CentOS-6 HA NFS Configuration Question

To All,

Looks like I got nicked by Occam's Razor when I "simplified" my cluster config file... :-)   A "less simplified" version is below.

My question still stands, however.  What does "cluster.conf" look like if you're trying to deploy a "highly available" NFS configuration.  And, again, by "highly available" I mean that NFS Clients never get the dreaded "stale nfs file handle" message unless the entire cluster has failed.

-RZ

p.s.  A better, but still simplified, cluster.conf for EL5.
<?xml version="1.0"?> <cluster alias="ha-nfs-el5" config_version="357" name="ha-nfs-el5"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="node01.arlut.utexas.edu" nodeid="1" votes="1"> <fence> <method name="1"> <device name="node01-ilo"/> </method> <method name="2"> <device name="sanbox01" port="0"/> </method> </fence> </clusternode> <clusternode name="node02.arlut.utexas.edu" nodeid="2" votes="1"> <fence> <method name="1"> <device name="node02-ilo"/> </method> <method name="2"> <device name="sanbox02" port="0"/> </method> </fence> </clusternode> <clusternode name="node03.arlut.utexas.edu" nodeid="3" votes="1"> <fence> <method name="1"> <device name="node03-ilo"/> </method> <method name="2"> <device name="sanbox03" port="0"/> </method> </fence> </clusternode> </clusternodes> <cman/> <fencedevices> <fencedevice agent="fence_sanbox2" ipaddr="sanbox01.arlut.utexas.edu" login="admin" name="sanbox01" passwd="password"/> <fencedevice agent="fence_sanbox2" ipaddr="sanbox02.arlut.utexas.edu" login="admin" name="sanbox02" passwd="password"/> <fencedevice agent="fence_sanbox2" ipaddr="sanbox03.arlut.utexas.edu" login="admin" name="sanbox03" passwd="password"/> <fencedevice agent="fence_ilo" hostname="node01-ilo" login="Administrator" name="node01-ilo" passwd="DUMMY"/> <fencedevice agent="fence_ilo" hostname="node02-ilo" login="Administrator" name="node02-ilo" passwd="DUMMY"/> <fencedevice agent="fence_ilo" hostname="node03-ilo" login="Administrator" name="node03-ilo" passwd="DUMMY"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="nfs1-domain" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="node01.arlut.utexas.edu" priority="1"/> <failoverdomainnode name="node02.arlut.utexas.edu" priority="2"/> <failoverdomainnode name="node03.arlut.utexas.edu" priority="3"/> </failoverdomain> <failoverdomain name="nfs2-domain" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="node01.arlut.utexas.edu" priority="3"/> <failoverdomainnode name="node02.arlut.utexas.edu" priority="1"/> <failoverdomainnode name="node03.arlut.utexas.edu" priority="2"/> </failoverdomain> <failoverdomain name="nfs3-domain" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="node01.arlut.utexas.edu" priority="2"/> <failoverdomainnode name="node02.arlut.utexas.edu" priority="3"/> <failoverdomainnode name="node03.arlut.utexas.edu" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip address="192.168.1.1" monitor_link="1"/> <ip address="192.168.1.2" monitor_link="1"/> <ip address="192.168.1.3" monitor_link="1"/> <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/> </resources> <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate"> <ip ref="192.168.1.1"> <fs __independent_subtree="1" ref="volume01"> <nfsexport name="nfs-volume01"> <nfsclient name=" " ref="local-subnet"/> </nfsexport> </fs> </ip> </service> <service autostart="1" domain="nfs2-domain" exclusive="0" name="nfs2" nfslock="1" recovery="relocate"> <ip ref="192.168.1.2"> <fs __independent_subtree="1" ref="volume02"> <nfsexport name="nfs-volume02"> <nfsclient name=" " ref="local-subnet"/> </nfsexport> </fs> </ip> </service> <service autostart="1" domain="nfs3-domain" exclusive="0" name="nfs3" nfslock="1" recovery="relocate"> <ip ref="192.168.1.3"> <fs __independent_subtree="1" ref="volume03"> <nfsexport name="nfs-volume03"> <nfsclient name=" " ref="local-subnet"/> </nfsexport> </fs> </ip> </service> </rm> </cluster>
-- Randy Zagar Sr. Unix Systems Administrator E-mail: zagar <at> arlut.utexas.edu Applied Research Laboratories Phone: 512 835-3131 Univ. of Texas at Austin
Attachment (smime.p7s): application/pkcs7-signature, 8 KiB
Randy Zagar | 14 May 20:42
Picon
Favicon

RHEL/CentOS-6 HA NFS Configuration Question

I have an existing CentOS-5 cluster I've configured for High-Availability NFS (v3).  Everything is working fine.  I've included a simplified cluster.conf file below.

I originally started with 3 file servers that were not clustered.  I converted to a clustered configuration where my NFS Clients never get "stale nfs" error messages.  When a node failed, all NFS exports (and their associated IP address) would  move to another system faster than my clients could time out.

I understand that changes to the portmapper in EL6 and NFSv4 make it much more difficult to configure HA-NFS and, so far, I have not seen any good documentation on how to configure a HA-NFS configuration in EL6.

Does anyone have any suggestions, or links to documentation that you can send me?

-RZ

p.s.  Simplified cluster.conf file for EL5...
<?xml version="1.0"?> <cluster alias="ha-nfs-el5" config_version="357" name="ha-nfs-el5"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="node01.arlut.utexas.edu" nodeid="1" votes="1"> <fence> <method name="1"> <device name="node01-ilo"/> </method> <method name="2"> <device name="sanbox01" port="0"/> </method> </fence> </clusternode> <clusternode name="node02.arlut.utexas.edu" nodeid="2" votes="1"> <fence> <method name="1"> <device name="node02-ilo"/> </method> <method name="2"> <device name="sanbox02" port="0"/> </method> </fence> </clusternode> <clusternode name="node03.arlut.utexas.edu" nodeid="3" votes="1"> <fence> <method name="1"> <device name="node03-ilo"/> </method> <method name="2"> <device name="sanbox03" port="0"/> </method> </fence> </clusternode> </clusternodes> <cman/> <fencedevices> <fencedevice agent="fence_sanbox2" ipaddr="sanbox01.arlut.utexas.edu" login="admin" name="sanbox01" passwd="password"/> <fencedevice agent="fence_sanbox2" ipaddr="sanbox02.arlut.utexas.edu" login="admin" name="sanbox02" passwd="password"/> <fencedevice agent="fence_sanbox2" ipaddr="sanbox03.arlut.utexas.edu" login="admin" name="sanbox03" passwd="password"/> <fencedevice agent="fence_ilo" hostname="node01-ilo" login="Administrator" name="node01-ilo" passwd="DUMMY"/> <fencedevice agent="fence_ilo" hostname="node02-ilo" login="Administrator" name="node02-ilo" passwd="DUMMY"/> <fencedevice agent="fence_ilo" hostname="node03-ilo" login="Administrator" name="node03-ilo" passwd="DUMMY"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="nfs1-domain" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="node01.arlut.utexas.edu" priority="1"/> <failoverdomainnode name="node02.arlut.utexas.edu" priority="2"/> <failoverdomainnode name="node03.arlut.utexas.edu" priority="3"/> </failoverdomain> <failoverdomain name="nfs2-domain" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="node01.arlut.utexas.edu" priority="3"/> <failoverdomainnode name="node02.arlut.utexas.edu" priority="1"/> <failoverdomainnode name="node03.arlut.utexas.edu" priority="2"/> </failoverdomain> <failoverdomain name="nfs3-domain" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="node01.arlut.utexas.edu" priority="2"/> <failoverdomainnode name="node02.arlut.utexas.edu" priority="3"/> <failoverdomainnode name="node03.arlut.utexas.edu" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip address="192.168.1.1" monitor_link="1"/> <ip address="192.168.1.2" monitor_link="1"/> <ip address="192.168.1.3" monitor_link="1"/> <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/> <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/> </resources> <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate"> <ip ref="192.168.1.1"> <fs __independent_subtree="1" ref="volume01"> <nfsexport name="nfs-cvg00-brazos02"> <nfsclient name=" " ref="local-subnet"/> </nfsexport> </fs> </ip> </service> <service autostart="1" domain="nfs2-domain" exclusive="0" name="nfs2" nfslock="1" recovery="relocate"> <ip ref="192.168.1.2"> <fs __independent_subtree="1" ref="volume02"> <nfsexport name="nfs-sdd01-data02"> <nfsclient name=" " ref="local-subnet"/> </nfsexport> </fs> </ip> </service> <service autostart="1" domain="nfs3-domain" exclusive="0" name="nfs3" nfslock="1" recovery="relocate"> <ip ref="192.168.1.3"> <fs __independent_subtree="1" ref="volume03"> <nfsclient name=" " ref="local-subnet"/> </nfsexport> </fs> </ip> </service> </rm> </cluster>
-- Randy Zagar Sr. Unix Systems Administrator E-mail: zagar <at> arlut.utexas.edu Applied Research Laboratories Phone: 512 835-3131 Univ. of Texas at Austin  
Attachment (smime.p7s): application/pkcs7-signature, 8 KiB
Picon

RHCS RHEL 6 GEO CLUSTER

Hi ,

I  want  know,  is RHCS  RHEL 6 GEO CLUSTER  supported ?,  What kind of cases ?

Thanks for    your   reponse.



--
Carlos Alberto Ramírez Rendón - RHCE -RHCDS- RHCVA - RHCI
Arquitecto de soluciones RedHat
Cel.: +57-1+310-879898
gpg: 1024R/F9220C2E  6742 3474 CF17 A82C 1888 5D4C F460 15DC F922 0C2E

Masatake YAMATO | 7 May 11:08
Picon
Favicon

[PATCH] typo in fence_kdump_send.8

Signed-off-by: Masatake YAMATO<yamato <at> redhat.com>

diff --git a/fence/agents/kdump/fence_kdump_send.8 b/fence/agents/kdump/fence_kdump_send.8
index 4cec124..ab95836 100644
--- a/fence/agents/kdump/fence_kdump_send.8
+++ b/fence/agents/kdump/fence_kdump_send.8
@@ -16,7 +16,7 @@ kdump kernel after a cluster node has encountered a kernel panic. Once
 the cluster node has entered the kdump crash recovery service,
 \fIfence_kdump_send\fP will periodically send messages to all cluster
 nodes. When the \fIfence_kdump\fP agent receives a valid message from
-the failed not, fencing is complete.
+the failed node, fencing is complete.
 .SH OPTIONS
 .TP
 .B -p, --ipport=\fIPORT\fP

Ralf Aumueller | 3 May 12:36
Picon
Picon
Favicon

RHEL6 Cluster: Update corosync RPMs

Hello,

recently there was an update of corosync and corosynclib rpms. Is it save to
just install these updates on a running two node cluster or do I have to use a
special procedure (e.g. Stop cluster services on node2; apply updates and reboot
node2; move services to node2 and update node1).

Regards,
Ralf

GouNiNi | 25 Apr 16:06
Picon

Clustering NFS4 with bests practices (bind)

Hello,

My question is similar with
http://www.redhat.com/archives/linux-cluster/2007-April/msg00125.html but it was in 2007.
I have a cluster with HALVM + ext3 (no GFS). I need to bind some directories in my chroot NFS4 directory for
various technical reaseon.

I tried many configurations but no succes. Here is one :

<resources>
        <ip address="XX.XX.XX.XX/26" monitor_link="1"/>
        <lvm lv_name="lv_applis_foobar" name="lv_applis_foobar" vg_name="vg_applis_foobar"/>
        <fs device="/dev/vg_applis_foobar/lv_applis_foobar" force_unmount="1" fstype="ext3"
mountpoint="/applis/foobar" name="fs_applis_foobar"/>
        <fs device="/applis/foobar" force_unmount="1" mountpoint="/exports/applis/foobar"
name="bind_applis_foobar" options="bind"/>
        <nfsexport name="/exports/applis/foobar"/>
        <nfsclient fsid="100" name="exp_/exports/applis/foobar" options="rw,sync"
path="/exports/applis/foobar" target="*.pma-dstage"/>
</resources>
<service autostart="1" domain="data" name="files.foobar.com" nfslock="1" recovery="relocate">
        <lvm ref="lv_applis_foobar">
                <fs ref="fs_applis_foobar">
                        <fs ref="bind_applis_foobar">
                                <nfsexport ref="/exports/applis/foobar">
                                        <nfsclient ref="exp_/exports/applis/foobar"/>
                                </nfsexport>
                        </fs>
                </fs>
        </lvm>
        <ip ref="XX.XX.XX.XX/26"/>
</service>

Logs say :

Apr 24 17:53:08 hostname rgmanager[13264]: [fs] start_filesystem: Could not match /applis/foobar with
a real device
Apr 24 17:53:15 hostanme rgmanager[11094]: start on fs "bind_applis_foobar" returned 2 (invalid argument(s))

Do you already use bind option in cluster.conf?

Regards,

---
Jean-Daniel Bonnetot

Script failed to run on RHCS, but it is successful on manually

HI,

tomcat_agent script failed to run when RHCS started. But I can run it successfully on manually. could you please check my script and tell me what problem?Below is cluster configuration and script:

[root <at> db05 init.d]# cat /etc/cluster/cluster.conf

<?xml version="1.0"?>

<cluster config_version="4" name="NLS_Test">

        <fence_daemon post_fail_delay="0" post_join_delay="3"/>

        <clusternodes>

                <clusternode name="db05" nodeid="1" votes="1">

                        <fence>

                                <method name="1"/>

                        </fence>

                </clusternode>

                <clusternode name="db07" nodeid="2" votes="1">

                        <fence>

                                <method name="1"/>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman expected_votes="1" two_node="1"/>

        <fencedevices>

                <fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.69.128.25" login="test" name="ilo_db05" passwd="Administrator"/>

                <fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.69.128.27" login="test" name="ilo_db07" passwd="Administrator"/>

        </fencedevices>

        <rm>

                <failoverdomains>

                        <failoverdomain name="ALLFOD" ordered="1" restricted="1">

                                <failoverdomainnode name="db05" priority="3"/>

                                <failoverdomainnode name="db07" priority="4"/>

                        </failoverdomain>

                        <failoverdomain name="ODDFOD" ordered="1" restricted="1">

                                <failoverdomainnode name="db05" priority="3"/>

                        </failoverdomain>

                </failoverdomains>

                <resources>

                        <script file="/etc/init.d/tomcat_agent" name="tomcat_agent"/>

                </resources>

                <service autostart="1" domain="allFOD" name="tomcat" recovery="restart">

                        <ip address="198.18.27.125/24" monitor_link="1"/>

                        <fs device="/dev/mapper/nls_testp2" force_fsck="1" force_unmount="0" fstype="ext3" mountpoint="/opt/nls/float/tomcat" name="tomcat" options="" self_fence="0"/>

                        <script ref="tomcat_agent"/>

                </service>

        </rm>

</cluster>

[root <at> db05 init.d]# cat tomcat_agent

#!/bin/bash

# file: tomcat_agent

# desc: Tomcat service agent, invoked by RHCS

source /etc/init.d/core_agent

TOMCAT_DIR=`ls ${_tomcat_home} | grep tomcat`

TOMCAT_BIN_DIR="${_tomcat_home}/${TOMCAT_DIR}/bin"

RETVAL=1

TOMCAT_STOP="./shutdown.sh"

TOMCAT_START="./startup.sh"

status() {

echo "status test" >> /tmp/wxg.txt

    #TODO: to monitor the port or whatever else in drop1a

    ps aux | grep -v grep | grep ${TOMCAT_DIR} 2>&1 > /dev/null

    return $?

}


start() {

echo "start" >>/tmp/wxg.txt

    sudo -i -u nls sh -c "/opt/nls/float/tomcat/apache-tomcat-6.0.33/bin/startup.sh" 2>&1 > /dev/null

    sleep 3

    status

        return $?

}

stop() {

echo "stop" >>/tmp/wxg.txt

sudo -i -u nls sh -c "/opt/nls/float/tomcat/apache-tomcat-6.0.33/bin/shutdown.sh" 2>&1 > /dev/null

    sleep 3

    status

    if [ $? -ne 0 ]; then return 0; fi

}


case "$1" in

        start)

                start

                RETVAL=$?

        ;;

        stop)

                stop

                RETVAL=$?

        ;;

        status)

                status

                RETVAL=$?

        ;;

        restart)

                echo $1

                stop

                start

                RETVAL=$?

        ;;

        *)

               echo $1

                logger "Usage: $0 {start|stop|status|restart}"

                RETVAL=2

        ;;

esac

exit ${RETVAL}


Gmane