李文逍(Gavin Lee | 1 Apr 2010 06:22
Picon

How to restart one resouce in cluster manually but not detected by cluster

Hi, all

I met the problem when using the cluster but need to restart one of the resources manually.

The scenario likes this:

For example, one resource httpd is managed by Red Hat Cluster Suite.

Sometimes I need to restart it manually by : service httpd restart

The script is located in /etc/init.d/httpd

When I restart it, the cluster sometimes detected the httpd is stopped, and reported a error.

How can I restart only one of the resources and do not let the cluster detect the failure? We need to restart only one of the resources but not restart all the resources.

So restart the who resource group doesn’t work for us.

 

Does anyone has similar experience?

 

 

Thanks & Best Regards!!


---------------------------
Gavin Lee
<div>
<span>Hi, all</span><br><br>

<p class="MsoNormal"><span lang="EN-US">I met the problem when using the cluster
but need to restart one of the resources manually.<br><br>
The scenario likes this:<br><br>
For example, one resource httpd is managed by Red Hat Cluster Suite.<br><br>
Sometimes I need to restart it manually by : service httpd restart <br><br>
The script is located in /etc/init.d/httpd<br><br>
When I restart it, the cluster sometimes detected the httpd is stopped, and
reported a error. <br><br>
How can I restart only one of the resources and do not let the cluster detect
the failure? We need to restart only one of the resources but not restart all
the resources.</span></p>

<p class="MsoNormal"><span lang="EN-US">So restart the who resource group doesn&rsquo;t
work for us.</span></p>

<p class="MsoNormal"><span lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span lang="EN-US">Does anyone has similar experience?</span></p>

<p class="MsoNormal"><span lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span lang="EN-US">Thanks &amp; Best
Regards!!</span></p>

<br>--------------------------- <br><span>Gavin Lee</span><br>
</div>
Joseph L. Casale | 1 Apr 2010 07:00

Re: How to restart one resouce in cluster manually but not detected by cluster

>How can I restart only one of the resources and do not let the cluster detect the failure? We need to restart
only one of the resources but not restart all the resources.
>So restart the who resource group doesn't work for us.

$ info clusvcadm

Look for the -Z option, it'll freeze it on the member and prevent status checks.

Don't forget to unfreeze:)

Markus Wolfgart | 2 Apr 2010 10:00
Picon
Favicon

Re: gfs2-utils source for recovery purpose of a corrupt gfs2 partition

Hi Cluster/GFS Experts,
Hi Bob,

as I get no response concerning my recovery issue, I would like to
summarize my activities, which could help someone else running in such a
problem with gfs2.

As the corrupted gfs2 (12TB b4 grow 25TB after) was hosted on a SE6540
disk array and the master is a Sun X4150 4GB machine with a CentOS 5.3
(i686/PAE), I run in the out of memory problem during the run of fsck.gfs2.

No matter what i have done, I was not able even use the temporary swap
file as found in some postings suggested.

As the os installation was done by other guys and they insist on this
configuration, I boot a rescue x86_64 dvd in order to overcame the
memory restriction.

In addition to this I was lucky to have some spare memory to increase
the ram to 16GB.

As I don't like to run the lvm/cman software as well as honestly
speeking not having much experience on this, I create and mount a large
xfs partition an the disk array to create a temporary swap file and to
store the files I hope to recover from the corrupted gfs2 partition.

An investigation via dd | od -c on the first mb of the gfs2 partition
reveal that after the lvm2 block of a size of 192k the sb (super block)
of the gfs2 starts.

creating an loopback device with an offset of 196608 bytes let my access
the file system via fsck without dlm/clvm etc.

losetup /dev/loop4 /dev/sdb  -o 196608

/sbin/fsck.gfs2 -f -p -y -v /dev/loop4

The index of the loop device depends on the usage of the rescue system.
Check it with losetup -a and take a number which is not currently used.

After some attempts on checking the gfs2 running again in the oom my
temp swap space is now about 0.7TG (no joke).

I start with 20GB of swap space and double the size every oom abort of
fsck.

Now I was lucky to pass the first and run into the second check

Initializing fsck
Initializing lists...
jid=0: Looking at journal...
jid=0: Journal is clean.
jid=1: Looking at journal...
jid=1: Journal is clean.
jid=2: Looking at journal...
jid=2: Journal is clean.
jid=3: Looking at journal...
jid=3: Journal is clean.
jid=4: Looking at journal...
jid=4: Journal is clean.
jid=5: Looking at journal...
jid=5: Journal is clean.
jid=6: Looking at journal...
jid=6: Journal is clean.
jid=7: Looking at journal...
jid=7: Journal is clean.
Initializing special inodes...
Validating Resource Group index.
Level 1 RG check.
Level 2 RG check.
Existing resource groups:
1: start: 17 (0x11), length = 529563 (0x8149b)
2: start: 529580 (0x814ac), length = 524241 (0x7ffd1)
3: start: 1053821 (0x10147d), length = 524241 (0x7ffd1)
4: start: 1578062 (0x18144e), length = 524241 (0x7ffd1)
...
9083643: start: 4762017571061 (0x454be5da0f5), length = 524241 (0x7ffd1)
9083644: start: 4762018095302 (0x454be65a0c6), length = 524241 (0x7ffd1)
9083645: start: 4762018619543 (0x454be6da097), length = 524241 (0x7ffd1)
9083646: start: 4762019143784 (0x454be75a068), length = 524241 (0x7ffd1)
...

In addition to this I start to explore the code of gfs2-utils
(folder libgfs2 and folder fsck) and was able to list the super block
infos.

As mentioned im my previous posting I was able to list all my file names
of interest located in a 7TB big image created from the dd output.

all files I'm looking for found in the directory structure (about 16
tousend) could be seen  by a simple od -s (string mode) or by the xxd
command.

xxd -a -u -c 64 -s 671088640 dev_oa_vg_storage1_oa_lv_storage1.bin | less

The first snippet of code I'm used to play around looks like listed
below and is just plain a cut and paste of the utils code:
The code just show some information of the super block.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <limits.h>
#include <errno.h>
#include <ctype.h>
#include <libintl.h>
#define _(String) gettext(String)
#include "gfs2structure.h"

int main(int argc, char *argv[])
{
	int fd;
	char *device, *field;

	unsigned char buf[GFS2_BASIC_BLOCK];
	unsigned char input[256];
	unsigned char output[256];
	
	struct gfs2_sb sb;
	struct gfs2_buffer_head dummy_bh;
	struct gfs2_dirent dirent,*dentp;;
	
	//struct gfs2_inum  sbmd;
	//struct gfs2_inum  sbrd;
	
	
	dummy_bh.b_data = (char *)buf;
	
	//memset(&dirent, 0, sizeof(struct gfs2_dirent));
	
	device = argv[1];
	
	fd = open(device, O_RDONLY);
	
	if (fd < 0)
		die("can't open %s: %s\n", device, strerror(errno));
	
	if (lseek(fd, GFS2_SB_ADDR * GFS2_BASIC_BLOCK, SEEK_SET) !=
	    GFS2_SB_ADDR * GFS2_BASIC_BLOCK) {
		fprintf(stderr, _("bad seek: %s from %s:%d: superblock\n"),
			strerror(errno), __FUNCTION__, __LINE__);
		exit(-1);
	}
	if (read(fd, buf, GFS2_BASIC_BLOCK) != GFS2_BASIC_BLOCK) {
		fprintf(stderr, _("bad read: %s from %s:%d: superblock\n"),
			strerror(errno), __FUNCTION__, __LINE__);
		exit(-1);
	}
	
	gfs2_sb_in(&sb, &dummy_bh);
	
	if (sb.sb_header.mh_magic != GFS2_MAGIC ||
	    sb.sb_header.mh_type != GFS2_METATYPE_SB)
		die( _("there isn't a GFS2 filesystem on %s\n"), device);
		
	printf( _("current lock protocol name = \"%s\"\n"),sb.sb_lockproto);	
	
	printf( _("current lock table name = \"%s\"\n"),sb.sb_locktable);
	
	printf( _("current ondisk format = %u\n"),sb.sb_fs_format);
	
	printf( _("current multihost format = %u\n"),sb.sb_multihost_format);
	
	//printf( _("current uuid = %s\n"), str_uuid(sb.sb_uuid));
	
	printf( _("current block size = %u\n"), sb.sb_bsize);
	
	printf( _("current block size shift = %u\n"), sb.sb_bsize_shift);
		
	printf( _("masterdir-addr = %u\n"), sb.sb_master_dir.no_addr);
	printf( _("masterdir-fino = %u\n"), sb.sb_master_dir.no_formal_ino);
	printf( _("rootdir-fino = %u\n"), sb.sb_root_dir.no_addr);
	printf( _("rootdir-fino = %u\n"), sb.sb_root_dir.no_formal_ino);
	
	printf( _("dummy_bh.sdp = %p\n"), dummy_bh.sdp);
	
	printf( _("sdp->blks_alloced = %u\n"), dummy_bh.sdp->blks_alloced);
	printf( _("sdp->blks_total = %u\n"), dummy_bh.sdp->blks_total);
	printf( _("sdp->device_name = %s\n"), dummy_bh.sdp->device_name);
	
	
	//gfs2_dirent_in(&dirent, (char *)dentp);

	//gfs2_dirent_print(&dirent, output);
	
        //gfs2_dinode_print(struct gfs2_dinode *di);
	
	
	close(fd);
}

I will keep you all informed on the progress of this story.

My next step will be - depending on the further progress of the fsck -
(if it fails or not) to overwrite the "lock_" and/or "fsck_" flags
in the image and to mount the gfs2 image to see what happens.

Meanwhile during the run of fsck which could take a while (used swap
space now is more the 510GB) as I was told, I hope someone could show
me how to run through the inodes using libgfs2 to collect data from them
or to point me to the right direction.

Many Thanks in Advance
and a nice Easter weekend.

Bye
Markus

*******************************************************
Markus Wolfgart
DLR Oberpfaffenhofen
German Remote Sensing Data Center
.
.
.
e-mail: markus.wolfgart <at> dlr.de
**********************************************************

Hi Bob,

thanks for prompt reply!

the fs originally was 12.4TB (6TB used) big.
After a resize attempt to 25TB by gfs2_grow (very very old version
gfs2-utils 1.62)
The fs was expand and the first impression looks good as df reported the
size of 25TB.
But looking from the second node to the fs (two nod system) ls -r and ls
-R throws
IO errors and gfs2 mount get frozen (reboot of machine was performed).
As no shrinking of gfs2 was possible to rollback, the additional
physical volume was removed from the logical volume (lvresize to org.
size & pvremove).
This hard cut of the gsf2 unfenced partition should be hopefully
repaired by the
fsck.gfs2 (newest version), this was my thought.
Even if this will not be the case, I could not run the fsck.gfs2 due to
a "of memory in compute_rgrp_layout" message.

see strace output:

write(1, "9098813: start: 4769970307031 (0"..., 739098813: start:
4769970307031 (0x4569862bfd7), length = 524241 (0x7ffd1)
) = 73
write(1, "9098814: start: 4769970831272 (0"..., 739098814: start:
4769970831272 (0x456986abfa8), length = 524241 (0x7ffd1)
) = 73
write(1, "9098815: start: 4769971355513 (0"..., 739098815: start:
4769971355513 (0x4569872bf79), length = 524241 (0x7ffd1)
) = 73
write(1, "9098816: start: 4769971879754 (0"..., 739098816: start:
4769971879754 (0x456987abf4a), length = 524241 (0x7ffd1)
) = 73
write(1, "9098817: start: 4769972403995 (0"..., 739098817: start:
4769972403995 (0x4569882bf1b), length = 524241 (0x7ffd1)
) = 73
brk(0xb7dea000)                         = 0xb7dc9000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "Out of memory in compute_rgrp_la"..., 37Out of memory in
compute_rgrp_layout
) = 37
exit_group(-1)                          = ?

As I had already increased my swapspace
swapon -s
Filename                                Type            Size    Used
Priority
/dev/sda3                               partition       8385920 0       -3
/var/swapfile.bin                       file            33554424
144     1
 and run again the same situation as before I decide to start to extract
the lost files by a c prog.

Now I have create a big Image (7TB) on a xfs partition and would like to
recover my files of interest
by a program using libgfs2 or part of the source from gfs2-utils, as
mentioned in my previous posting.
 As I see nearly all of the files located in the dir structure and get
the position in the image by
a simple string command, I hope to extract them in a simpler way.

The RG size was set to the Max value of 2GB end each file I'm looking
for is about 250BM big.
The amount of files to be recovered is more then 16k.
Every file have a header with his file name ant the total size, so it
should be easy to check if the
recovery of it is successful.

So thats my theory, but this could be a easter vacation project without
the right knowledge of gfs2.
As I'm lucky to have the gfs2-utils source I hope it could be done.
But if there is a simpler way to do a recovery by the installed gfs2
progs like gfs2_edit or gfs2_tool
or other tools it would be nice if someone could show my the proper way.

Many Thanks in advance

Markus  -- *******************************************************
Markus Wolfgart
DLR Oberpfaffenhofen
German Remote Sensing Data Center
.
.
.
e-mail: markus.wolfgart <at> dlr.de
**********************************************************

 ----- "Markus Wolfgart" <markus wolfgart dlr de> wrote:
| Hallo Cluster and GFS Experts,
| | I'm a new  subscriber of this mailing list and appologise
| in the case my posting is offtopic.
| | I'm looking for help concerning a corrupt gfs2 file system
| which could not be recovered by me by fsck.gfs2 (Ver. 3.0.9)
| due to to less less physical memory (4GB) eaven if increasing it
| by a additional swap space (now about 35GB).
| | I would like to parse a image created of the lost fs (the first 6TB)
| with the code provided in the new gfs2-utils release.
| | Due to this circumstance I hope to find in this mailing list some
| hints
| concerning an automated step by step recovery of lost data.
| | Many Thanks in advance for your help
| | Markus

Hi Markus,

You said that fsck.gfs2 is not working but you did not say what
messages it gives you when you try.  This must be a very big
file system.  How big is it?  Was it converted from gfs1?

Regards,

Bob Peterson
Red Hat File Systems

Jankowski, Chris | 2 Apr 2010 13:20
Picon
Favicon

Listing openAIS parameters on RHEL Cluster Suite 5

Hi,

As per Red Hat Knowledgebase note 18886 on RHEL 5.4 I should be able to get the current in-memory values of the
openAIS paramemters by running the following commands:

# openais-confdb-display 
totem.version = '2'
totem.secauth = '1'
# openais-confdb-display totem token
totem.token = '10000'
# openais-confdb-display totem consensustotem.consensus = '4800'
# openais-confdb-display totem token_retransmits_before_loss_const
totem.token_retransmits_before_loss_const = '20'
# openais-confdb-display cman quorum_dev_poll
cman.quorum_dev_poll = '40000'
# openais-confdb-display cman expected_votes 
cman.expected_votes = '3'
# openais-confdb-display cman two_node
cman.two_node = '1'  

On my 5.4 cluster it works for the first 4 commands, but for the last 3 commands I get, respectively:

Could not get "quorum _dev_poll" :1
Could not get "expected_votes" :1
Could not get "two_node" :1

Anybody knows what is going on here?

Thanks and regards,

Chris

Jankowski, Chris | 2 Apr 2010 13:26
Picon
Favicon

Is nit worth setting up jumbo Ethernet frames on the cluster interconnect link?

Hi,
 
On a heavily used cluster with GFS2 is it worth setting up jumbo Ethernet frames on the cluster interconnect link?  Obviously, if only miniscule portion of the packets travelling through this link are larger than standard 1500 MTU then why to bother.
 
I am seeing significant traffic on the link up to 50,000 packets per second.
 
Thanks and regards,
 
Chris
 
<div>

<div>Hi,</div>
<div>&nbsp;</div>
<div>On a heavily used cluster with GFS2 is it worth setting up jumbo Ethernet frames on the cluster interconnect link?&nbsp; Obviously, if only miniscule portion of the packets travelling through this link are larger than standard 1500 MTU then why to bother.</div>
<div>&nbsp;</div>
<div>I am seeing significant traffic on the link up to 50,000 packets per second.</div>
<div>&nbsp;</div>
<div>Thanks and regards,</div>
<div>&nbsp;</div>
<div>Chris</div>
<div>&nbsp;</div>

</div>
Jeff Sturm | 2 Apr 2010 16:00
Favicon

Re: Is nit worth setting up jumbo Ethernet frames on the cluster interconnect link?

You probably wouldn't gain much, if anything.  I see packets averaging around 130-160 bytes in size on one of our clusters.  Fewer than 1% of packets are greater than 1400 bytes in size.

 

On the other hand, if you are using Ethernet for storage, you almost certainly want jumbo frames on any such interfaces.

 

Jeff

 

From: linux-cluster-bounces <at> redhat.com [mailto:linux-cluster-bounces <at> redhat.com] On Behalf Of Jankowski, Chris
Sent: Friday, April 02, 2010 7:27 AM
To: linux clustering
Subject: [Linux-cluster] Is nit worth setting up jumbo Ethernet frames on the cluster interconnect link?

 

Hi,

 

On a heavily used cluster with GFS2 is it worth setting up jumbo Ethernet frames on the cluster interconnect link?  Obviously, if only miniscule portion of the packets travelling through this link are larger than standard 1500 MTU then why to bother.

 

I am seeing significant traffic on the link up to 50,000 packets per second.

 

Thanks and regards,

 

Chris

 

<div>

<div class="Section1">

<p class="MsoNormal"><span>You probably wouldn't gain much, if anything.&nbsp; I see
packets averaging around 130-160 bytes in size on one of our clusters.&nbsp;
Fewer than 1% of packets are greater than 1400 bytes in size.<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>On the other hand, if you are using Ethernet for storage, you
almost certainly want jumbo frames on any such interfaces.<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>Jeff<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<div>

<div>

<div>

<p class="MsoNormal"><span>From:</span><span>
linux-cluster-bounces <at> redhat.com [mailto:linux-cluster-bounces <at> redhat.com] On
Behalf Of Jankowski, Chris<br>Sent: Friday, April 02, 2010 7:27 AM<br>To: linux clustering<br>Subject: [Linux-cluster] Is nit worth setting up jumbo Ethernet frames
on the cluster interconnect link?<p></p></span></p>

</div>

</div>

<p class="MsoNormal"><p>&nbsp;</p></p>

<div>

<p class="MsoNormal"><span>Hi,<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>&nbsp;<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>On
a heavily used cluster with GFS2 is it worth setting up jumbo Ethernet frames
on the cluster interconnect link?&nbsp; Obviously, if only miniscule portion of
the packets travelling through this link are larger than standard 1500 MTU then
why to bother.<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>&nbsp;<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>I
am seeing significant traffic on the link up to 50,000 packets per second.<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>&nbsp;<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>Thanks
and regards,<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>&nbsp;<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>Chris<p></p></span></p>

</div>

<div>

<p class="MsoNormal"><span>&nbsp;<p></p></span></p>

</div>

</div>

</div>

</div>
Markus Wolfgart | 2 Apr 2010 17:52
Picon
Favicon

Re: gfs2-utils source for recovery purpose of a corrupt gfs2 partition

Hi Cluster/GFS Experts,

I was playing arround with the libgfs2 and the fsck source and
start the function "initialize" on my 30GB small fragment of the
binary image of the corrupted partition  hosted on my notebook.

...
if ((sdp = (struct gfs2_sbd*)calloc(1,sizeof(struct gfs2_sbd)))==NULL)
	{
	  printf( _("sdp-adr!! = %p\n"), sdp);
	  //fill_super_block(sdp);
	}
	else
	{
	  printf( _("sdp-adr = %p\n"), sdp);
	  //retval=fill_super_block(sdp);
	  retval=initialize(sdp, 0, 0, &all_clean);
	}
...

The output I get from it, besides my own is listed below.

>bin/Release> ./recover_file_from_gfs2_image
/mnt/mybook/GFS2-Problem/dev_oa_vg_storage1_oa_lv_storage1.bin

current lock protocol name = "fsck_dlm"
current lock table name = "oa-dp:oa_gfs1"
current ondisk format = 1801
current multihost format = 1900
current block size = 4096
current block size shift = 12
masterdir-addr = 51
masterdir-fino = 2
rootdir-fino = 50
rootdir-fino = 1
dummy_bh.sdp = 0x4016c0
sdp-adr = 0x619010
Validating Resource Group index.
Level 1 RG check.
(level 1 failed)
Level 2 RG check.
L2: number of rgs in the index = 11541.
L2: number of rgs expected     = 150.
L2: They don't match; either (1) the fs was extended, (2) an odd
L2: rg size was used, or (3) we have a corrupt rg index.
(level 2 failed)
Level 3 RG check.
RG 2 is damaged: getting dist from index: 0x8149b
  RG 1 at block 0x11 intact [length 0x8149b]
  RG 2 at block 0x814AC intact [length 0x21fd1]
  RG 3 at block 0xA347D intact [length 0x21fd1]
* RG 4 at block 0xC544E *** DAMAGED *** [length 0x21fd1]
* RG 5 at block 0xE741F *** DAMAGED *** [length 0x21fd1]
* RG 6 at block 0x1093F0 *** DAMAGED *** [length 0x21fd1]
* RG 7 at block 0x12B3C1 *** DAMAGED *** [length 0x21fd1]
Error: too many bad RGs.
Error rebuilding rg list.
(level 3 failed)
RG recovery impossible; I can't fix this file system.
sdp->blks_alloced = 3071213568
sdp->blks_total = 1711852225

A second run on a 50GB fragment run in an oom, despite
4GB Ram and 4GB swap.

Validating Resource Group index.
Level 1 RG check.
(level 1 failed)
Level 2 RG check.
Out of memory in compute_rgrp_layout

So how much virtual memory should be provideed to get a succesfull
run for 128k big RGs let say for a 1TB big fs, when only 50GB data
cause to alloc more then 8GB memory?

Bye and many thanks for information

Markus

Hi Cluster/GFS Experts,
Hi Bob,

as I get no response concerning my recovery issue, I would like to
summarize my activities, which could help someone else running in such a
problem with gfs2.

As the corrupted gfs2 (12TB b4 grow 25TB after) was hosted on a SE6540
disk array and the master is a Sun X4150 4GB machine with a CentOS 5.3
(i686/PAE), I run in the out of memory problem during the run of fsck.gfs2.

No matter what i have done, I was not able even use the temporary swap
file as found in some postings suggested.

As the os installation was done by other guys and they insist on this
configuration, I boot a rescue x86_64 dvd in order to overcame the
memory restriction.

In addition to this I was lucky to have some spare memory to increase
the ram to 16GB.

As I don't like to run the lvm/cman software as well as honestly
speeking not having much experience on this, I create and mount a large
xfs partition an the disk array to create a temporary swap file and to
store the files I hope to recover from the corrupted gfs2 partition.

An investigation via dd | od -c on the first mb of the gfs2 partition
reveal that after the lvm2 block of a size of 192k the sb (super block)
of the gfs2 starts.

creating an loopback device with an offset of 196608 bytes let my access
the file system via fsck without dlm/clvm etc.

losetup /dev/loop4 /dev/sdb  -o 196608

/sbin/fsck.gfs2 -f -p -y -v /dev/loop4

The index of the loop device depends on the usage of the rescue system.
Check it with losetup -a and take a number which is not currently used.

After some attempts on checking the gfs2 running again in the oom my
temp swap space is now about 0.7TG (no joke).

I start with 20GB of swap space and double the size every oom abort of
fsck.

Now I was lucky to pass the first and run into the second check

Initializing fsck
Initializing lists...
jid=0: Looking at journal...
jid=0: Journal is clean.
jid=1: Looking at journal...
jid=1: Journal is clean.
jid=2: Looking at journal...
jid=2: Journal is clean.
jid=3: Looking at journal...
jid=3: Journal is clean.
jid=4: Looking at journal...
jid=4: Journal is clean.
jid=5: Looking at journal...
jid=5: Journal is clean.
jid=6: Looking at journal...
jid=6: Journal is clean.
jid=7: Looking at journal...
jid=7: Journal is clean.
Initializing special inodes...
Validating Resource Group index.
Level 1 RG check.
Level 2 RG check.
Existing resource groups:
1: start: 17 (0x11), length = 529563 (0x8149b)
2: start: 529580 (0x814ac), length = 524241 (0x7ffd1)
3: start: 1053821 (0x10147d), length = 524241 (0x7ffd1)
4: start: 1578062 (0x18144e), length = 524241 (0x7ffd1)
...
9083643: start: 4762017571061 (0x454be5da0f5), length = 524241 (0x7ffd1)
9083644: start: 4762018095302 (0x454be65a0c6), length = 524241 (0x7ffd1)
9083645: start: 4762018619543 (0x454be6da097), length = 524241 (0x7ffd1)
9083646: start: 4762019143784 (0x454be75a068), length = 524241 (0x7ffd1)
...

In addition to this I start to explore the code of gfs2-utils
(folder libgfs2 and folder fsck) and was able to list the super block
infos.

As mentioned im my previous posting I was able to list all my file names
of interest located in a 7TB big image created from the dd output.

all files I'm looking for found in the directory structure (about 16
tousend) could be seen  by a simple od -s (string mode) or by the xxd
command.

xxd -a -u -c 64 -s 671088640 dev_oa_vg_storage1_oa_lv_storage1.bin | less

The first snippet of code I'm used to play around looks like listed
below and is just plain a cut and paste of the utils code:
The code just show some information of the super block.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <limits.h>
#include <errno.h>
#include <ctype.h>
#include <libintl.h>
#define _(String) gettext(String)
#include "gfs2structure.h"

int main(int argc, char *argv[])
{
	int fd;
	char *device, *field;

	unsigned char buf[GFS2_BASIC_BLOCK];
	unsigned char input[256];
	unsigned char output[256];
	
	struct gfs2_sb sb;
	struct gfs2_buffer_head dummy_bh;
	struct gfs2_dirent dirent,*dentp;;
	
	//struct gfs2_inum  sbmd;
	//struct gfs2_inum  sbrd;
	
	
	dummy_bh.b_data = (char *)buf;
	
	//memset(&dirent, 0, sizeof(struct gfs2_dirent));
	
	device = argv[1];
	
	fd = open(device, O_RDONLY);
	
	if (fd < 0)
		die("can't open %s: %s\n", device, strerror(errno));
	
	if (lseek(fd, GFS2_SB_ADDR * GFS2_BASIC_BLOCK, SEEK_SET) !=
	    GFS2_SB_ADDR * GFS2_BASIC_BLOCK) {
		fprintf(stderr, _("bad seek: %s from %s:%d: superblock\n"),
			strerror(errno), __FUNCTION__, __LINE__);
		exit(-1);
	}
	if (read(fd, buf, GFS2_BASIC_BLOCK) != GFS2_BASIC_BLOCK) {
		fprintf(stderr, _("bad read: %s from %s:%d: superblock\n"),
			strerror(errno), __FUNCTION__, __LINE__);
		exit(-1);
	}
	
	gfs2_sb_in(&sb, &dummy_bh);
	
	if (sb.sb_header.mh_magic != GFS2_MAGIC ||
	    sb.sb_header.mh_type != GFS2_METATYPE_SB)
		die( _("there isn't a GFS2 filesystem on %s\n"), device);
		
	printf( _("current lock protocol name = \"%s\"\n"),sb.sb_lockproto);	
	
	printf( _("current lock table name = \"%s\"\n"),sb.sb_locktable);
	
	printf( _("current ondisk format = %u\n"),sb.sb_fs_format);
	
	printf( _("current multihost format = %u\n"),sb.sb_multihost_format);
	
	//printf( _("current uuid = %s\n"), str_uuid(sb.sb_uuid));
	
	printf( _("current block size = %u\n"), sb.sb_bsize);
	
	printf( _("current block size shift = %u\n"), sb.sb_bsize_shift);
		
	printf( _("masterdir-addr = %u\n"), sb.sb_master_dir.no_addr);
	printf( _("masterdir-fino = %u\n"), sb.sb_master_dir.no_formal_ino);
	printf( _("rootdir-fino = %u\n"), sb.sb_root_dir.no_addr);
	printf( _("rootdir-fino = %u\n"), sb.sb_root_dir.no_formal_ino);
	
	printf( _("dummy_bh.sdp = %p\n"), dummy_bh.sdp);
	
	printf( _("sdp->blks_alloced = %u\n"), dummy_bh.sdp->blks_alloced);
	printf( _("sdp->blks_total = %u\n"), dummy_bh.sdp->blks_total);
	printf( _("sdp->device_name = %s\n"), dummy_bh.sdp->device_name);
	
	
	//gfs2_dirent_in(&dirent, (char *)dentp);

	//gfs2_dirent_print(&dirent, output);
	
        //gfs2_dinode_print(struct gfs2_dinode *di);
	
	
	close(fd);
}

I will keep you all informed on the progress of this story.

My next step will be - depending on the further progress of the fsck -
(if it fails or not) to overwrite the "lock_" and/or "fsck_" flags
in the image and to mount the gfs2 image to see what happens.

Meanwhile during the run of fsck which could take a while (used swap
space now is more the 510GB) as I was told, I hope someone could show
me how to run through the inodes using libgfs2 to collect data from them
or to point me to the right direction.

Many Thanks in Advance
and a nice Easter weekend.

Bye
Markus

*******************************************************
Markus Wolfgart
DLR Oberpfaffenhofen
German Remote Sensing Data Center
.
.
.
e-mail: markus.wolfgart <at> dlr.de
**********************************************************

Hi Bob,

thanks for prompt reply!

the fs originally was 12.4TB (6TB used) big.
After a resize attempt to 25TB by gfs2_grow (very very old version
gfs2-utils 1.62)
The fs was expand and the first impression looks good as df reported the
size of 25TB.
But looking from the second node to the fs (two nod system) ls -r and ls
-R throws
IO errors and gfs2 mount get frozen (reboot of machine was performed).
As no shrinking of gfs2 was possible to rollback, the additional
physical volume was removed from the logical volume (lvresize to org.
size & pvremove).
This hard cut of the gsf2 unfenced partition should be hopefully
repaired by the
fsck.gfs2 (newest version), this was my thought.
Even if this will not be the case, I could not run the fsck.gfs2 due to
a "of memory in compute_rgrp_layout" message.

see strace output:

write(1, "9098813: start: 4769970307031 (0"..., 739098813: start:
4769970307031 (0x4569862bfd7), length = 524241 (0x7ffd1)
) = 73
write(1, "9098814: start: 4769970831272 (0"..., 739098814: start:
4769970831272 (0x456986abfa8), length = 524241 (0x7ffd1)
) = 73
write(1, "9098815: start: 4769971355513 (0"..., 739098815: start:
4769971355513 (0x4569872bf79), length = 524241 (0x7ffd1)
) = 73
write(1, "9098816: start: 4769971879754 (0"..., 739098816: start:
4769971879754 (0x456987abf4a), length = 524241 (0x7ffd1)
) = 73
write(1, "9098817: start: 4769972403995 (0"..., 739098817: start:
4769972403995 (0x4569882bf1b), length = 524241 (0x7ffd1)
) = 73
brk(0xb7dea000)                         = 0xb7dc9000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "Out of memory in compute_rgrp_la"..., 37Out of memory in
compute_rgrp_layout
) = 37
exit_group(-1)                          = ?

As I had already increased my swapspace
swapon -s
Filename                                Type            Size    Used
Priority
/dev/sda3                               partition       8385920 0       -3
/var/swapfile.bin                       file            33554424
144     1
 and run again the same situation as before I decide to start to extract
the lost files by a c prog.

Now I have create a big Image (7TB) on a xfs partition and would like to
recover my files of interest
by a program using libgfs2 or part of the source from gfs2-utils, as
mentioned in my previous posting.
 As I see nearly all of the files located in the dir structure and get
the position in the image by
a simple string command, I hope to extract them in a simpler way.

The RG size was set to the Max value of 2GB end each file I'm looking
for is about 250BM big.
The amount of files to be recovered is more then 16k.
Every file have a header with his file name ant the total size, so it
should be easy to check if the
recovery of it is successful.

So thats my theory, but this could be a easter vacation project without
the right knowledge of gfs2.
As I'm lucky to have the gfs2-utils source I hope it could be done.
But if there is a simpler way to do a recovery by the installed gfs2
progs like gfs2_edit or gfs2_tool
or other tools it would be nice if someone could show my the proper way.

Many Thanks in advance

Markus  -- *******************************************************
Markus Wolfgart
DLR Oberpfaffenhofen
German Remote Sensing Data Center
.
.
.
e-mail: markus.wolfgart <at> dlr.de
**********************************************************

 ----- "Markus Wolfgart" <markus wolfgart dlr de> wrote:
| Hallo Cluster and GFS Experts,
| | I'm a new  subscriber of this mailing list and appologise
| in the case my posting is offtopic.
| | I'm looking for help concerning a corrupt gfs2 file system
| which could not be recovered by me by fsck.gfs2 (Ver. 3.0.9)
| due to to less less physical memory (4GB) eaven if increasing it
| by a additional swap space (now about 35GB).
| | I would like to parse a image created of the lost fs (the first 6TB)
| with the code provided in the new gfs2-utils release.
| | Due to this circumstance I hope to find in this mailing list some
| hints
| concerning an automated step by step recovery of lost data.
| | Many Thanks in advance for your help
| | Markus

Hi Markus,

You said that fsck.gfs2 is not working but you did not say what
messages it gives you when you try.  This must be a very big
file system.  How big is it?  Was it converted from gfs1?

Regards,

Bob Peterson
Red Hat File Systems

Joseph L. Casale | 2 Apr 2010 18:36

Re: Is nit worth setting up jumbo Ethernet frames on the cluster interconnect link?

>On the other hand, if you are using Ethernet for storage, you almost certainly want jumbo frames on any such interfaces.

Can't say I agree with a simple broad statement like that.
Rather than restate what smarter people than me have already done, here
is a quote from a dev in the IET list I have come to trust:

http://old.nabble.com/Re%3A-Performance-increase-p10218304.html

Test with and without, simply enabling jumbo's isn't certainly going to
help, it may or may not.

Arun Kp | 3 Apr 2010 13:32
Picon

Re: Linux-cluster Digest, Vol 71, Issue 46

Dear All ,


May I Know how to do the Active-Active Clustering in RHEL 5.4

--
Thanks&Regards,
Arun K P
HCL Infosystems Ltd
Kolkata 26


On 30 March 2010 21:30, <linux-cluster-request <at> redhat.com> wrote:
Send Linux-cluster mailing list submissions to
       linux-cluster <at> redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
       https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
       linux-cluster-request <at> redhat.com

You can reach the person managing the list at
       linux-cluster-owner <at> redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

  1. RHEL5.4: conga luci - Runtime Error: maximum recursion depth
     exceeded (Hofmeister, James (WTEC Linux))
  2. why does ip.sh launch rdisc ? (Martin Waite)


----------------------------------------------------------------------

Message: 1
Date: Mon, 29 Mar 2010 18:11:45 +0000
From: "Hofmeister, James (WTEC Linux)" <james.hofmeister <at> hp.com>
To: "linux-cluster <at> redhat.com" <linux-cluster <at> redhat.com>
Subject: [Linux-cluster] RHEL5.4: conga luci - Runtime Error: maximum
       recursion depth exceeded
Message-ID:
       <EC61DD7B6048464AB0E1B713AF7521BC1760ECB4E7 <at> GVW0676EXC.americas.hpqcorp.net>

Content-Type: text/plain; charset="us-ascii"

Hello All,
RE: RHEL5.4: conga luci - Runtime Error: maximum recursion depth exceeded

Has anybody seen this?    RHEL5.4 with ricci-0.12.2-6.el5_4.1-x86_64 and luci-0.12.1-7.el5.x86_64:

Runtime Error
 Sorry, a site error occurred.

 Traceback (innermost last):

     * Module ZPublisher.Publish, line 196, in publish_module_standard

     * Module Products.PlacelessTranslationService.PatchStringIO, line
 34, in new_publish
     * Module ZPublisher.Publish, line 146, in publish
     * Module Zope2.App.startup, line 222, in
 zpublisher_exception_hook
     * Module ZPublisher.Publish, line 121, in publish
     * Module Zope2.App.startup, line 240, in commit
     * Module transaction._manager, line 96, in commit
     * Module transaction._transaction, line 380, in commit
     * Module transaction._transaction, line 378, in commit
     * Module transaction._transaction, line 433, in _commitResources
     * Module ZODB.Connection, line 484, in commit
     * Module ZODB.Connection, line 526, in _commit
     * Module ZODB.Connection, line 553, in _store_objects
     * Module ZODB.serialize, line 407, in serialize
     * Module ZODB.serialize, line 416, in _dump

 Runtime Error: maximum recursion depth exceeded (Also, the following
 error occurred while attempting to render the standard error message,
 please see the event log for full details: An operation previously
 failed, with traceback: File
 &quot;/usr/lib64/luci/zope/lib/python/ZServer/PubCore/ZServerPubl
 isher.py&quot;, line 23, in __init__ response=response) File
 &quot;/usr/lib64/luci/zope/lib/python/ZPublisher/Publish.py&q
 uot;, line 395, in publish_module environ, debug, request, response)
 File
 &quot;/usr/lib64/luci/zope/lib/python/ZPublisher/Publish.py&q
 uot;, line 196, in publish_module_standard response =
 publish(request, module_name, after list, debug=debug) File
 &quot;/usr/lib64/luci/zope/lib/python/Products/PlacelessTranslati
 onService/PatchStringIO.py&quot;, line 34, in new_publish x =
 Publish.old_publish(request, module_name, after_list, debug) File
 &quot;/usr/lib64/luci/zope/lib/python/ZPublisher/Publish.py&q
 uot;, line 121, in publish transactions_manager.commit() File
 &quot;/usr/lib64/luci/zope/lib/python/Zope2/App/startup.py&qu
 ot;, line 240, in commit transaction.commit() File
 &quot;/usr/lib64/luci/zope/lib/python/transaction/_manager.py&
 ;quot;, line 96, in commit return self.get().commit(sub,
 deprecation_wng=False) File
 &quot;/usr/lib64/luci/zope/lib/python/transaction/_transaction.py
 &quot;, line 380, in commit self._saveCommitishError() # This
 raises! File
 &quot;/usr/lib64/luci/zope/lib/python/transaction/_transaction.py
 &quot;, line 378, in commit self._commitResources() File
 &quot;/usr/lib64/luci/zope/lib/python/transaction/_transaction.py
 &quot;, line 433, in _commitResources rm.commit(self) File
 &quot;/usr/lib64/luci/zope/lib/python/ZODB/Connection.py&quot
 ;, line 484, in commit self._commit(transaction) File
 &quot;/usr/lib64/luci/zope/lib/python/ZODB/Connection.py&quot
 ;, line 526, in _commit self._store_objects(ObjectWriter(obj),
 transaction) File
 &quot;/usr/lib64/luci/zope/lib/python/ZODB/Connection.py&quot
 ;, line 553, in _store_objects p = writer.serialize(obj) # This calls
 __getstate__ of obj File
 &quot;/usr/lib64/luci/zope/lib/python/ZODB/serialize.py&quot;
 , line 407, in serialize return self._dump(meta, obj.__getstate__())
 File
 &quot;/usr/lib64/luci/zope/lib/python/ZODB/serialize.py&quot;
 , line 416, in _dump self._p.dump(state) RuntimeError: maximum
 recursion depth exceeded )

Regards,
 James Hofmeister




------------------------------

Message: 2
Date: Tue, 30 Mar 2010 05:11:51 +0100
From: "Martin Waite" <Martin.Waite <at> datacash.com>
To: <Linux-cluster <at> redhat.com>
Subject: [Linux-cluster] why does ip.sh launch rdisc ?
Message-ID:
       <A78DB34D00374344A0AB65B6523C05DC03D205D6 <at> marsden.win.datacash.com>
Content-Type: text/plain;       charset="iso-8859-1"

Hi,

I have noticed that rdisc - apparently a router discovery protocol daemon - has started running on nodes that take possession of a VIP using ip.sh.

I am not familiar with rdisc.   It is currently installed on all my RHEL hosts, but is not running.

Do I need to run rdisc ?

Also, the man page says that rdisc uses 224.0.0.1 as a multicast address.  So does my current cman configuration.  Should I configure cman to avoid this address ?

regards,
Martin




------------------------------

--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 71, Issue 46
*********************************************




<div>
<p>Dear All ,<br><br><br>May I Know how to do the Active-Active Clustering in RHEL 5.4<br><br>-- <br>Thanks&amp;Regards,<br>Arun K P<br>HCL Infosystems Ltd<br>Kolkata 26<br><br><br></p>
<div class="gmail_quote">On 30 March 2010 21:30,  <span dir="ltr">&lt;<a href="mailto:linux-cluster-request <at> redhat.com">linux-cluster-request <at> redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">Send Linux-cluster mailing list submissions to<br>
 &nbsp; &nbsp; &nbsp; &nbsp;<a href="mailto:linux-cluster <at> redhat.com">linux-cluster <at> redhat.com</a><br><br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
 &nbsp; &nbsp; &nbsp; &nbsp;<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
or, via email, send a message with subject or body 'help' to<br>
 &nbsp; &nbsp; &nbsp; &nbsp;<a href="mailto:linux-cluster-request <at> redhat.com">linux-cluster-request <at> redhat.com</a><br><br>
You can reach the person managing the list at<br>
 &nbsp; &nbsp; &nbsp; &nbsp;<a href="mailto:linux-cluster-owner <at> redhat.com">linux-cluster-owner <at> redhat.com</a><br><br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Linux-cluster digest..."<br><br><br>
Today's Topics:<br><br>
 &nbsp; 1. RHEL5.4: conga luci - Runtime Error: maximum recursion depth<br>
 &nbsp; &nbsp; &nbsp;exceeded (Hofmeister, James (WTEC Linux))<br>
 &nbsp; 2. why does ip.sh launch rdisc ? (Martin Waite)<br><br><br>
----------------------------------------------------------------------<br><br>
Message: 1<br>
Date: Mon, 29 Mar 2010 18:11:45 +0000<br>
From: "Hofmeister, James (WTEC Linux)" &lt;<a href="mailto:james.hofmeister <at> hp.com">james.hofmeister <at> hp.com</a>&gt;<br>
To: "<a href="mailto:linux-cluster <at> redhat.com">linux-cluster <at> redhat.com</a>" &lt;<a href="mailto:linux-cluster <at> redhat.com">linux-cluster <at> redhat.com</a>&gt;<br>
Subject: [Linux-cluster] RHEL5.4: conga luci - Runtime Error: maximum<br>
 &nbsp; &nbsp; &nbsp; &nbsp;recursion depth exceeded<br>
Message-ID:<br>
 &nbsp; &nbsp; &nbsp; &nbsp;&lt;<a href="mailto:EC61DD7B6048464AB0E1B713AF7521BC1760ECB4E7 <at> GVW0676EXC.americas.hpqcorp.net">EC61DD7B6048464AB0E1B713AF7521BC1760ECB4E7 <at> GVW0676EXC.americas.hpqcorp.net</a>&gt;<br><br>
Content-Type: text/plain; charset="us-ascii"<br><br>
Hello All,<br>
RE: RHEL5.4: conga luci - Runtime Error: maximum recursion depth exceeded<br><br>
Has anybody seen this? &nbsp; &nbsp;RHEL5.4 with ricci-0.12.2-6.el5_4.1-x86_64 and luci-0.12.1-7.el5.x86_64:<br><br>
Runtime Error<br>
 &nbsp;Sorry, a site error occurred.<br><br>
 &nbsp;Traceback (innermost last):<br><br>
 &nbsp; &nbsp; &nbsp;* Module ZPublisher.Publish, line 196, in publish_module_standard<br><br>
 &nbsp; &nbsp; &nbsp;* Module Products.PlacelessTranslationService.PatchStringIO, line<br>
 &nbsp;34, in new_publish<br>
 &nbsp; &nbsp; &nbsp;* Module ZPublisher.Publish, line 146, in publish<br>
 &nbsp; &nbsp; &nbsp;* Module Zope2.App.startup, line 222, in<br>
 &nbsp;zpublisher_exception_hook<br>
 &nbsp; &nbsp; &nbsp;* Module ZPublisher.Publish, line 121, in publish<br>
 &nbsp; &nbsp; &nbsp;* Module Zope2.App.startup, line 240, in commit<br>
 &nbsp; &nbsp; &nbsp;* Module transaction._manager, line 96, in commit<br>
 &nbsp; &nbsp; &nbsp;* Module transaction._transaction, line 380, in commit<br>
 &nbsp; &nbsp; &nbsp;* Module transaction._transaction, line 378, in commit<br>
 &nbsp; &nbsp; &nbsp;* Module transaction._transaction, line 433, in _commitResources<br>
 &nbsp; &nbsp; &nbsp;* Module ZODB.Connection, line 484, in commit<br>
 &nbsp; &nbsp; &nbsp;* Module ZODB.Connection, line 526, in _commit<br>
 &nbsp; &nbsp; &nbsp;* Module ZODB.Connection, line 553, in _store_objects<br>
 &nbsp; &nbsp; &nbsp;* Module ZODB.serialize, line 407, in serialize<br>
 &nbsp; &nbsp; &nbsp;* Module ZODB.serialize, line 416, in _dump<br><br>
 &nbsp;Runtime Error: maximum recursion depth exceeded (Also, the following<br>
 &nbsp;error occurred while attempting to render the standard error message,<br>
 &nbsp;please see the event log for full details: An operation previously<br>
 &nbsp;failed, with traceback: File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZServer/PubCore/ZServerPubl<br>
 &nbsp;isher.py&amp;quot;, line 23, in __init__ response=response) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZPublisher/Publish.py&amp;q<br>
 &nbsp;uot;, line 395, in publish_module environ, debug, request, response)<br>
 &nbsp;File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZPublisher/Publish.py&amp;q<br>
 &nbsp;uot;, line 196, in publish_module_standard response =<br>
 &nbsp;publish(request, module_name, after list, debug=debug) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/Products/PlacelessTranslati<br>
 &nbsp;onService/PatchStringIO.py&amp;quot;, line 34, in new_publish x =<br>
 &nbsp;Publish.old_publish(request, module_name, after_list, debug) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZPublisher/Publish.py&amp;q<br>
 &nbsp;uot;, line 121, in publish transactions_manager.commit() File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/Zope2/App/startup.py&amp;qu<br>
 &nbsp;ot;, line 240, in commit transaction.commit() File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/transaction/_manager.py&amp;<br>
 &nbsp;;quot;, line 96, in commit return self.get().commit(sub,<br>
 &nbsp;deprecation_wng=False) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/transaction/_transaction.py<br>
 &nbsp;&amp;quot;, line 380, in commit self._saveCommitishError() # This<br>
 &nbsp;raises! File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/transaction/_transaction.py<br>
 &nbsp;&amp;quot;, line 378, in commit self._commitResources() File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/transaction/_transaction.py<br>
 &nbsp;&amp;quot;, line 433, in _commitResources rm.commit(self) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZODB/Connection.py&amp;quot<br>
 &nbsp;;, line 484, in commit self._commit(transaction) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZODB/Connection.py&amp;quot<br>
 &nbsp;;, line 526, in _commit self._store_objects(ObjectWriter(obj),<br>
 &nbsp;transaction) File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZODB/Connection.py&amp;quot<br>
 &nbsp;;, line 553, in _store_objects p = writer.serialize(obj) # This calls<br>
 &nbsp;__getstate__ of obj File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZODB/serialize.py&amp;quot;<br>
 &nbsp;, line 407, in serialize return self._dump(meta, obj.__getstate__())<br>
 &nbsp;File<br>
 &nbsp;&amp;quot;/usr/lib64/luci/zope/lib/python/ZODB/serialize.py&amp;quot;<br>
 &nbsp;, line 416, in _dump self._p.dump(state) RuntimeError: maximum<br>
 &nbsp;recursion depth exceeded )<br><br>
Regards,<br>
 &nbsp;James Hofmeister<br><br><br><br><br>
------------------------------<br><br>
Message: 2<br>
Date: Tue, 30 Mar 2010 05:11:51 +0100<br>
From: "Martin Waite" &lt;<a href="mailto:Martin.Waite <at> datacash.com">Martin.Waite <at> datacash.com</a>&gt;<br>
To: &lt;<a href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a>&gt;<br>
Subject: [Linux-cluster] why does ip.sh launch rdisc ?<br>
Message-ID:<br>
 &nbsp; &nbsp; &nbsp; &nbsp;&lt;<a href="mailto:A78DB34D00374344A0AB65B6523C05DC03D205D6 <at> marsden.win.datacash.com">A78DB34D00374344A0AB65B6523C05DC03D205D6 <at> marsden.win.datacash.com</a>&gt;<br>
Content-Type: text/plain; &nbsp; &nbsp; &nbsp; charset="iso-8859-1"<br><br>
Hi,<br><br>
I have noticed that rdisc - apparently a router discovery protocol daemon - has started running on nodes that take possession of a VIP using ip.sh.<br><br>
I am not familiar with rdisc. &nbsp; It is currently installed on all my RHEL hosts, but is not running.<br><br>
Do I need to run rdisc ?<br><br>
Also, the man page says that rdisc uses 224.0.0.1 as a multicast address. &nbsp;So does my current cman configuration. &nbsp;Should I configure cman to avoid this address ?<br><br>
regards,<br>
Martin<br><br><br><br><br>
------------------------------<br><br>
--<br>
Linux-cluster mailing list<br><a href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a><br><a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br><br>
End of Linux-cluster Digest, Vol 71, Issue 46<br>
*********************************************<br>
</blockquote>
</div>
<br><br clear="all"><br><br>
</div>
Jakov Sosic | 4 Apr 2010 03:44
Picon

Re: rgmanager and clvm don't work after reboot

On 03/24/2010 04:05 PM, Jakov Sosic wrote:
> On 03/23/2010 11:30 PM, Jakov Sosic wrote:
>> On 03/23/2010 09:10 PM, Jakov Sosic wrote:
>>
>>> I this a similar issue? Services trying to communicate with member 0,
>>> which is a qdisk and not a real member? :-/
>>
>>
>> If I start clvmd with "-d 2" options (debug), I get this:
>>
>> # /etc/init.d/clvmd-debug start
>> Starting clvmd: CLVMD[eefac820]: Mar 23 23:23:32 CLVMD started
>> CLVMD[eefac820]: Mar 23 23:23:32 Connected to CMAN
>> CLVMD[eefac820]: Mar 23 23:23:32 CMAN initialisation complete
>>
>> and it hangs there...
> 
> Also it seems that this disturbed all the instances of clvm on other
> nodes too :( So now I can't 'lvs' or 'vgs' on any node...
> 
> It seems that cluster restart is imminent :(
> 
> This seems like a horrible bug :(
> 
> 

So no opinions on this one? I still have a locked and unuseable cluster
which I have to reboot because of this :( And the worst part is problem
is reproducable - it's enough to leave it a couple of days on, and then
reboot a single node...

--

-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |


Gmane