Megan . | 19 Feb 14:50 2015
Picon

Number of GFS2 mounts

Good Morning!

We have an 11 node Centos 6.6 cluster configuration.  We are using it
to share SAN mounts between servers (GFS2 via iscsi with LVM).  We
have a requirement to have 33 GFS2 mounts shared on the cluster (crazy
i know).  Are there any limitations on doing this?  I couldn't find
anything in the documentation about number of mounts, just the size of
the mounts.  Is there anything I can do to tune our cluster to handle
this requirement?

Thanks!

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Equipe R&S Netplus | 13 Feb 17:39 2015
Picon

NFS HA

Hello,

I would like to setup a cluster of NFS.
With RHCS, I use the ressource agent "nfsserver".

But I have a question :
Is it possible manage a NFS server where NFS client "will not be aware of any loss of service" ? In other words, if the NFS service failover the NFS client don't see any change.

Actually when there is a failover, I can't access to the NFS server anymore.
Indeed, I had the message "Stale NFS file handle".
In client NFS log :
<<
NFS: server X.X.X.X error: fileid changed
>>

Is there any solution please ?
Thank you.

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Vallevand, Mark K | 13 Feb 17:16 2015
Picon

Is there a way for a resource agent to know the previous node on which it was active?

Is there a way for a resource agent to know the previous node on which it was active?

 

Regards.
Mark K Vallevand   Mark.Vallevand <at> Unisys.com
Outside of a dog, a book is man's best friend.
 Inside of a dog, its too dark to read.  - Groucho

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Masatake YAMATO | 5 Feb 10:57 2015
Picon

wrong error messages in fence-virt

Is fence-virt still maintained?
I cannot find the git repository for it.
There is one at sf.net. However, it looks obsoleted.

With my broken configuration, I got following debug output from
fence_xvm...

    #  fence_xvm -H targethost -o status -dddddd
    Debugging threshold is now 6
    -- args  <at>  0x7fff762de810 --
    ...
    Opening /dev/urandom
    Sending to 225.0.0.12 via 192.168.122.113
    Waiting for connection from XVM host daemon.
    Issuing TCP challenge
>   read: Is a directory
    Invalid response to challenge
    Operation failed

Look at the line marked with '>'. The error message is strange for me
because as far as reading the source code, read is called with a socket connected
to fence_virtd.

So I conducted a code walking and found two bugs:

1. Checking the result of read( and write ) system call

   perror is called even if the call is successful.

2. "read" is specified as an argument for perror when write system call is faield.

Both are not critical if fence_virtd is configured well.
However, users may be confused when it is not well.

Followig patch is not tested at all but it represents what I want to
say in above list.

Masatake YAMATO

--- fence-virt-0.3.2/common/simple_auth.c	2013-11-05 01:08:35.000000000 +0900
+++ fence-virt-0.3.2/common/simple_auth.c.new	2015-02-05 18:40:53.471029118 +0900
 <at>  <at>  -260,9 +260,13  <at>  <at> 
 		return 0;
 	}

-	if (read(fd, response, sizeof(response)) < sizeof(response)) {
+	ret = read(fd, response, sizeof(response));
+	if (ret < 0) {
 		perror("read");
 		return 0;
+	} else if (ret < sizeof(response)) {
+		fprintf(stderr, "RESPONSE is too short(%d) in %s\n", ret, __FUNCTION__);
+		return 0;
 	}

 	ret = !memcmp(response, hash, sizeof(response));
 <at>  <at>  -333,7 +337,7  <at>  <at> 
 	HASH_Destroy(h);

 	if (write(fd, hash, sizeof(hash)) < sizeof(hash)) {
-		perror("read");
+		perror("write");
 		return 0;
 	}

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

cluster lab | 29 Jan 05:50 2015
Picon

GFS2: "Could not open" the file on one of the nodes

Hi,

In a two node cluster, i received to different result from "qemu-img
check" on just one file:

node1 # qemu-img check VMStorage/x.qcow2
No errors were found on the image.

Node2 # qemu-img check VMStorage/x.qcow2
qemu-img: Could not open 'VMStorage/x.qcow2"

All other files are OK, and the cluster works properly.
What is the problem?

====
Packages:
kernel: 2.6.32-431.5.1.el6.x86_64
GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
corosync: corosync-1.4.1-17.el6.x86_64

Best Regards

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Vladimir Melnik | 13 Dec 17:04 2014
Picon

The file on a GFS2-filesystem seems to be corrupted

Dear colleagues,

I encountered some very strange issue and would be grateful if you share
your thoughts on that.

I have a qcow2-image that is located at gfs2 filesystem on a cluster.
The cluster works fine and there are dozens of other qcow2-images, but,
as I can see, one of images seems to be corrupted.

First of all, it has quite unusual size:
> stat /mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak
  File: `/mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak'
  Size: 7493992262336241664     Blocks: 821710640  IO Block: 4096   regular file
Device: fd06h/64774d    Inode: 220986752   Links: 1
Access: (0744/-rwxr--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-10-09 16:25:24.864877839 +0300
Modify: 2014-12-13 14:41:29.335603509 +0200
Change: 2014-12-13 15:52:35.986888549 +0200

By the way, I noticed that blocks' number looks rather okay.

Also qemu-img can't recognize it as an image:
> qemu-img info /mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak
image: /mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak
file format: raw
virtual size: 6815746T (7493992262336241664 bytes)
disk size: 392G

Disk size, although, looks more reasonable: the image's size is really
should be about 300-400G, as I remember.

Alas, I can't do anything with this image. I can't check it by qemu-img,
neither I can convert it to the new image, as qemu-img can't do anything
with it:

> qemu-img convert -p -f qcow2 -O qcow2 /mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak /mnt/tmp/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0
Could not open '/mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak': Invalid argument
Could not open '/mnt/sp1/ac2cb28f-09ac-4ca0-bde1-471e0c7276a0.bak'

Any one have experienced the same issue? What do you think, is it qcow2
issue or a gfs2 issue? What would you do in similar situation?

Any ideas, hints and comments would be greatly appreciated.

Yes, I have snapshots, that's good, but wouldn't like to lose today's
changes to the data on that image. And I'm worried about the filesystem
at all: what if something goes wrong if I try to remove that file?

Thanks to all!

-- 
V.Melnik

P.S. I use CentOS-6 and I have these packages installed:
	qemu-img-0.12.1.2-2.415.el6_5.4.x86_64
	gfs2-utils-3.0.12.1-59.el6_5.1.x86_64
	lvm2-cluster-2.02.100-8.el6.x86_64
	cman-3.0.12.1-59.el6_5.1.x86_64
	clusterlib-3.0.12.1-59.el6_5.1.x86_64
	kernel-2.6.32-431.5.1.el6.x86_64

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Jürgen Ladstätter | 2 Dec 11:29 2014

Fencing and dead locks

Hi guys,

 

we’re running a 9 node cluster with 5 gfs2 mounts. The cluster is mainly used for load balancing web based applications. Fencing is done with IPMI and works.

Sometimes one server gets fenced, but after rebooting isn’t able to rejoin the cluster. This triggers higher load and many open processes, leading to another server being fenced. This server then isn’t able to rejoin either and this continues until we lose quorum and have to manually restart the whole cluster.

Sadly this is not reproducible, but it looks like it happens more often when there is more write IO.

 

Since a whole cluster deadlock kinda removes the sense of a cluster, we’d need some input what we could do or change.

We’re running Centos 6.6, kernel 2.6.32-504.1.3.el6.x86_64

 

Did anyone of you test gfs2 with centos 7? Any known major bugs that could cause dead locks?

 

Thanks in advance, Jürgen

 

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Megan . | 1 Dec 15:16 2014
Picon

new cluster acting odd

Good Day,

I'm fairly new to the cluster world so i apologize in advance for
silly questions.  Thank you for any help.

We decided to use this cluster solution in order to share GFS2 mounts
across servers.  We have a 7 node cluster that is newly setup, but
acting oddly.  It has 3 vmware guest hosts and 4 physical hosts (dells
with Idracs).  They are all running Centos 6.6.  I have fencing
working (I'm able to do fence_node node and it will fence with
success).  I do not have the gfs2 mounts in the cluster yet.

When I don't touch the servers, my cluster looks perfect with all
nodes online. But when I start testing fencing, I have an odd problem
where i end up with split brain between some of the nodes.  They won't
seem to automatically fence each other when it gets like this.

in the  corosync.log for the node that gets split out i see the totem
chatter, but it seems confused and just keeps doing the below over and
over:

Dec 01 12:39:15 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a 2b 2c

Dec 01 12:39:17 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a 2b 2c

Dec 01 12:39:19 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a 2b 2c

Dec 01 12:39:39 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b

Dec 01 12:39:39 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
21 23 24 25 26 27 28 29 2a 2b 32
..
..
..
Dec 01 12:54:49 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c

Dec 01 12:54:50 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c

Dec 01 12:54:50 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c

I can manually fence it, and it still comes online with the same
issue.  I end up having to take the whole cluster down, sometimes
forcing reboot on some nodes, then brining it back up.  Its takes a
good part of the day just to bring the whole cluster online again.

I used ccs -h node --sync --activate and double checked to make sure
they are all using the same version of the cluster.conf file.

Once issue I did notice, is that when one of the vmware hosts is
rebooted, the time comes off slitty skewed (6 seconds) but i thought i
read somewhere that a skew that minor shouldn't impact the cluster.

We have multicast enabled on the interfaces

          UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
and we have been told by our network team that IGMP snooping is disabled.

With tcpdump I can see the multi-cast traffic chatter.

Right now:

[root <at> data1-uat ~]# clustat
Cluster Status for projectuat  <at>  Mon Dec  1 13:56:39 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 archive1-uat.domain.com                                1 Online
 admin1-uat.domain.com                                  2 Online
 mgmt1-uat.domain.com                                   3 Online
 map1-uat.domain.com                                    4 Online
 map2-uat.domain.com                                    5 Online
 cache1-uat.domain.com                                  6 Online
 data1-uat.domain.com                                   8 Online, Local

** Has itself ass online **
[root <at> map1-uat ~]# clustat
Cluster Status for projectuat  <at>  Mon Dec  1 13:57:07 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 archive1-uat.domain.com                                1 Online
 admin1-uat.domain.com                                  2 Online
 mgmt1-uat.domain.com                                   3 Online
 map1-uat.domain.com                                    4 Offline, Local
 map2-uat.domain.com                                    5 Online
 cache1-uat.domain.com                                  6 Online
 data1-uat.domain.com                                   8 Online

[root <at> cache1-uat ~]# clustat
Cluster Status for projectuat  <at>  Mon Dec  1 13:57:39 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 archive1-uat.domain.com                                1 Online
 admin1-uat.domain.com                                  2 Online
 mgmt1-uat.domain.com                                   3 Online
 map1-uat.domain.com                                    4 Online
 map2-uat.domain.com                                    5 Online
 cache1-uat.domain.com                                  6 Offline, Local
 data1-uat.domain.com                                   8 Online

[root <at> mgmt1-uat ~]# clustat
Cluster Status for projectuat  <at>  Mon Dec  1 13:58:04 2014
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 archive1-uat.domain.com                                1 Offline
 admin1-uat.domain.com                                  2 Offline
 mgmt1-uat.domain.com                                   3 Online, Local
 map1-uat.domain.com                                    4 Offline
 map2-uat.domain.com                                    5 Offline
 cache1-uat.domain.com                                  6 Offline
 data1-uat.domain.com                                   8 Offline

cman-3.0.12.1-68.el6.x86_64

[root <at> data1-uat ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="66" name="projectuat">
<clusternodes>
<clusternode name="admin1-uat.domain.com" nodeid="2">
<fence>
<method name="fenceadmin1uat">
<device name="vcappliancesoap" port="admin1-uat" ssl="on"
uuid="421df3c4-a686-9222-366e-9a67b25f62b2"/>
</method>
</fence>
</clusternode>
<clusternode name="mgmt1-uat.domain.com" nodeid="3">
<fence>
<method name="fenceadmin1uat">
<device name="vcappliancesoap" port="mgmt1-uat" ssl="on"
uuid="421d5ff5-66fa-5703-66d3-97f845cf8239"/>
</method>
</fence>
</clusternode>
<clusternode name="map1-uat.domain.com" nodeid="4">
<fence>
<method name="fencemap1uat">
<device name="idracmap1uat"/>
</method>
</fence>
</clusternode>
<clusternode name="map2-uat.domain.com" nodeid="5">
<fence>
<method name="fencemap2uat">
<device name="idracmap2uat"/>
</method>
</fence>
</clusternode>
<clusternode name="cache1-uat.domain.com" nodeid="6">
<fence>
<method name="fencecache1uat">
<device name="idraccache1uat"/>
</method>
</fence>
</clusternode>
<clusternode name="data1-uat.domain.com" nodeid="8">
<fence>
<method name="fencedata1uat">
<device name="idracdata1uat"/>
</method>
</fence>
</clusternode>
<clusternode name="archive1-uat.domain.com" nodeid="1">
<fence>
<method name="fenceadmin1uat">
<device name="vcappliancesoap" port="archive1-uat" ssl="on"
uuid="421d16b2-3ed0-0b9b-d530-0b151d81d24e"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_vmware_soap" ipaddr="x.x.x.130"
login="fenceuat" login_timeout="10" name="vcappliancesoap"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="10"
power_wait="30" retry_on="3" shell_timeout="10" ssl="1"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
ipaddr="x.x.x.47" login="fenceuat" name="idracdata1uat"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
ipaddr="x.x.x.48" login="fenceuat" name="idracdata2uat"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
ipaddr="x.x.x.82" login="fenceuat" name="idracmap1uat"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
ipaddr="x.x.x.96" login="fenceuat" name="idracmap2uat"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
ipaddr="x.x.x.83" login="fenceuat" name="idraccache1uat"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
ipaddr="x.x.x.97" login="fenceuat" name="idraccache2uat"
passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
</fencedevices>
</cluster>

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Rajat | 28 Nov 06:51 2014
Picon

Cluster Overhead I/O, Network, Memory, CPU

Hey Team,

Our customer is using RHEL 5.X and RHEL 6.X as Cluster in they production stack.

Customer is looking is there any doc/white paper which can share they management as cluster service usages on
Disk                        %
Network                %
Memory                 %
CPU                        %

Gratitude


--
--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Pradipta Singha | 12 Nov 13:10 2014
Picon

Deployment of Redhat cluster setup 6 to provide HA to oracle 11g R2

Hi Team,

I have to setup 2 node  Redhat cluster 6  to provide HA to oracle 11g R2 database with two instance.Kindly help me to setup the cluster.

Below  shared file system (shared in both node ) are for data file.

/dev/mapper/vg1-lv3                   gfs2   250G  2.2G  248G   1% /u01
/dev/mapper/vg1-lv4                   gfs2   175G  268M  175G   1% /u02
/dev/mapper/vg1-lv5                   gfs2    25G  259M   25G   2% /u03
/dev/mapper/vg1-lv6                   gfs2    25G  259M   25G   2% /u04
/dev/mapper/vg1-lv7                   gfs2    25G  259M   25G   2% /u05
/dev/mapper/vg1-lv8                   gfs2   300G  259M  300G   1% /u06
/dev/mapper/vg1- lv9                   gfs2   300G  1.8G  299G   1% /u07

And below  local file system (both are local to both the node) are  for database binary on both node-
/dev/mapper/vg2-lv1_oracle            ext4    99G  4.5G   89G   5% /oracle -> one instance for oracle database

/dev/mapper/vg2-lv2_orafmw            ext4    99G   60M   94G   1% /orafmw -> another for application in stance

Note-Two instance will run one for oracle database and another for application.


Thanks
pradipta

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
陈楼 | 12 Nov 09:20 2014

GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block

hi ,guys
I have a two-nodes GFS2 cluster based on  logic volume created by drbd block device /dev/drbd0. The two nodes' mount points of  GFS2 filesystem are exported by samba share. Then there are two clients mounting and copying data into them respectively. Hours later, one client(assume just call it clientA) has finished all tasks, while the other client(assume just call it clientB) is still copying with very slow write speed(2-3MB/s, in normal case 40-100MB/s). 
Then I doubt that the there is something wrong with gfs2 filesystem on the corresponding server node that clientB mount to, and I try to write some data into it by 
excute commad as follows:  
[root <at> dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000
1000+0 records in
1000+0 records out
131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s
It shows the write speed is too slow,  almostly hangs up. I redo it once again, it hangs up. Then, I terminate it with 『Ctr + c』, and kernel reports error messages as
follows:
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   bh = 25 (magic number)
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to acquire journal lock...
Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted 2.6.32-358.el6.x86_64 #1
Nov 12 11:50:11 dcs-229 kernel: Call Trace:
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044be22>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044bf75>] ? gfs2_meta_check_ii+0x45/0x50 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04367d9>] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8105e203>] ? perf_event_task_sched_out+0x33/0x80
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0431505>] ? gfs2_inode_refresh+0x25/0x2c0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0430b48>] ? inode_go_lock+0x88/0xf0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f25b>] ? do_promote+0x1bb/0x330 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f548>] ? finish_xmote+0x178/0x410 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04303e3>] ? glock_work_func+0x133/0x1d0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04302b0>] ? glock_work_func+0x0/0x1d0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090ac0>] ? worker_thread+0x170/0x2a0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed
And the other node also reports error messages:
Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted 2.6.32-358.el6.x86_64 #1
Nov 12 11:48:50 dcs-226 kernel: Call Trace:
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478e22>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478f75>] ? gfs2_meta_check_ii+0x45/0x50 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa04637d9>] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffff8105e203>] ? perf_event_task_sched_out+0x33/0x80
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045e505>] ? gfs2_inode_refresh+0x25/0x2c0 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045db48>] ? inode_go_lock+0x88/0xf0 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid metadata block
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   bh = 66213 (magic number)
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw this file system
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to unmount
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c25b>] ? do_promote+0x1bb/0x330 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c548>] ? finish_xmote+0x178/0x410 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d3e3>] ? glock_work_func+0x133/0x1d0 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d2b0>] ? glock_work_func+0x0/0x1d0 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090ac0>] ? worker_thread+0x170/0x2a0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
After this, mount points has crashed. what should i do? Anyone could help me?


--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Gmane