ted | 1 Aug 2008 03:01
Picon

Re: Some nodes won't join after being fenced

We seem to have found part of the culprit.
 
We're using an Extreme switch that handles all of our traffic in seperate VLAN's, and the IGMP bits of the ExtremeOS seem to be interfering with the clusters ability to recover itself from such an episode.
 
At the moment we're leaning towards the Juniper switch as we moved to an identically configured (as far as ports and VLANs go) Juniper EX-4200 and the cluster was able to recover itself with a single node (of nine) being fenced.
 
While on the Extreme, each node needed to be fenced in turn for the cluster to be able to recover fully.  This means each node being able to mount the GFS mount r/w and actually be able to write and delete test files on the mount point.
 
Our testing continues and we're trying to come up with "real" evidence such as proof that some parts of the multicast traffic are or aren't being dealt with properly.  So far the empirical evidence suggests the above conclusions.
 
-ted

 
On 7/31/08, Brandon Young <bkyoung <at> gmail.com> wrote:
I have occasionally run into this problem, too.  I have found that sometimes I can work around the problem by chkconfig'ing clvmd,cman,and rgmanager off, rebooting, then manually starting cman, rgmanager, clvmd (in that order).  Usually, after that, I am able to fence the node(s) and they will rejoin automatically (after re-enabling automatic startup with chkconfig, of course).  I know this workaround doesn't explain *why* it happens, but it has more than once helped me get my cluster nodes back online without having to reboot all the nodes.


On Thu, Jul 31, 2008 at 1:42 PM, Mailing List <ml <at> adamdein.com> wrote:
Hello,

I currently have a 9 node centos 5.1 cman/gfs cluster which I've managed to break.

It is broken in almost exactly the same way as stated in these two previous threads:

http://www.spinics.net/lists/cluster/msg10304.html
http://www.redhat.com/archives/linux-cluster/2008-May/msg00060.html

However, I can find no resolution in the archives. My only guaranteed resolution at this point is a cold restart of all nodes which to me seems ridiculous (ie: I'm missing something).

To add a little details, I have nodes cluster1...9. Nodes 7 & 8 are broken. When I fence/reboot them, cman starts but times out on starting fencing. cman_tools nodes shows them as joined but the fence domain looks broke.

Any ideas?

I have included some information for a good node, bad node, and /var/log/messages from a good node that did the fencing.

Good Node:

[root <at> cluster1 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
  1   M    768   2008-07-31 12:47:19  cluster1-rhc
  2   M    776   2008-07-31 12:47:37  cluster2-rhc
  3   M    772   2008-07-31 12:47:19  cluster3-rhc
  4   M    788   2008-07-31 12:56:20  cluster4-rhc
  5   M    772   2008-07-31 12:47:19  cluster5-rhc
  6   M    784   2008-07-31 12:52:50  cluster6-rhc
  7   M    808   2008-07-31 13:24:24  cluster7-rhc
  8   X    800                        cluster8-rhc
  9   M    772   2008-07-31 12:47:19  cluster9-rhc
[root <at> cluster1 ~]# cman_tool services
type             level name      id       state
fence            0     default   00010003 FAIL_START_WAIT
[1 2 3 4 5 6 9]
dlm              1     testgfs1  00020005 none
[1 2 3 4 5 6]
gfs              2     testgfs1  00010005 none
[1 2 3 4 5 6]
[root <at> cluster1 ~]# cman_tool status
Version: 6.1.0
Config Version: 13
Cluster Name: test
Cluster Id: 1678
Cluster Member: Yes
Cluster Generation: 808
Membership state: Cluster-Member
Nodes: 8
Expected votes: 9
Total votes: 8
Quorum: 5
Active subsystems: 7
Flags: Dirty
Ports Bound: 0
Node name: cluster1-rhc
Node ID: 1
Multicast addresses: 239.192.6.148
Node addresses: 10.128.161.81
[root <at> cluster1 ~]# group_tool
type             level name      id       state
fence            0     default   00010003 FAIL_START_WAIT
[1 2 3 4 5 6 9]
dlm              1     testgfs1  00020005 none
[1 2 3 4 5 6]
gfs              2     testgfs1  00010005 none
[1 2 3 4 5 6]
[root <at> cluster1 ~]#


Bad/broken Node:

[root <at> cluster7 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
  1   M    808   2008-07-31 13:24:24  cluster1-rhc
  2   M    808   2008-07-31 13:24:24  cluster2-rhc
  3   M    808   2008-07-31 13:24:24  cluster3-rhc
  4   M    808   2008-07-31 13:24:24  cluster4-rhc
  5   M    808   2008-07-31 13:24:24  cluster5-rhc
  6   M    808   2008-07-31 13:24:24  cluster6-rhc
  7   M    804   2008-07-31 13:24:24  cluster7-rhc
  8   X      0                        cluster8-rhc
  9   M    808   2008-07-31 13:24:24  cluster9-rhc
[root <at> cluster7 ~]# cman_tool services
type             level name     id       state
fence            0     default  00000000 JOIN_STOP_WAIT
[1 2 3 4 5 6 7 9]
[root <at> cluster7 ~]# cman_tool status
Version: 6.1.0
Config Version: 13
Cluster Name: test
Cluster Id: 1678
Cluster Member: Yes
Cluster Generation: 808
Membership state: Cluster-Member
Nodes: 8
Expected votes: 9
Total votes: 8
Quorum: 5
Active subsystems: 7
Flags: Dirty
Ports Bound: 0
Node name: cluster7-rhc
Node ID: 7
Multicast addresses: 239.192.6.148
Node addresses: 10.128.161.87
[root <at> cluster7 ~]# group_tool
type             level name     id       state
fence            0     default  00000000 JOIN_STOP_WAIT
[1 2 3 4 5 6 7 9]
[root <at> cluster7 ~]#


/var/log/messages:

Jul 31 13:20:54 cluster3 fence_node[3813]: Fence of "cluster7-rhc" was successful
Jul 31 13:21:03 cluster3 fence_node[3815]: Fence of "cluster8-rhc" was successful
Jul 31 13:21:11 cluster3 openais[3084]: [TOTEM] entering GATHER state from 12.
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering GATHER state from 11.
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Saving state aru 89 high seq received 89
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Storing new sequence id for ring 324
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering COMMIT state.
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering RECOVERY state.
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [0] member 10.128.161.81:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [1] member 10.128.161.82:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [2] member 10.128.161.83:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 kernel: dlm: closing connection to node 7
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 kernel: dlm: closing connection to node 8
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [3] member 10.128.161.84:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [4] member 10.128.161.85:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [5] member 10.128.161.86:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [6] member 10.128.161.89:
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Did not need to originate any messages in recovery.
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] New Configuration:
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.81)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.82)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.83)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.84)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.85)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.86)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.89)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] Members Left:
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.87)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.88)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] Members Joined:
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] New Configuration:
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.81)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.82)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.83)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.84)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.85)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.86)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.89)
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] Members Left:
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] Members Joined:
Jul 31 13:21:16 cluster3 openais[3084]: [SYNC ] This node is within the primary component and will provide service.
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering OPERATIONAL state.
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.81
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.82
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.83
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.84
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.85
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.86
Jul 31 13:21:16 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.89
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 2
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 3
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 4
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 5
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 6
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 9
Jul 31 13:21:16 cluster3 openais[3084]: [CPG  ] got joinlist message from node 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering GATHER state from 11.
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Saving state aru 68 high seq received 68
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Storing new sequence id for ring 328
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering COMMIT state.
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering RECOVERY state.
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [0] member 10.128.161.81:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [1] member 10.128.161.82:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [2] member 10.128.161.83:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [3] member 10.128.161.84:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [4] member 10.128.161.85:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [5] member 10.128.161.86:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [6] member 10.128.161.87:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.87
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 9 high delivered 9 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [7] member 10.128.161.89:
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Did not need to originate any messages in recovery.
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] New Configuration:
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.81)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.82)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.83)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.84)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.85)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.86)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.89)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] Members Left:
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] Members Joined:
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] New Configuration:
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.81)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.82)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.83)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.84)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.85)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.86)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.87)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.89)
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] Members Left:
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] Members Joined:
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ]         r(0) ip(10.128.161.87)
Jul 31 13:24:24 cluster3 openais[3084]: [SYNC ] This node is within the primary component and will provide service.
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering OPERATIONAL state.
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.81
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.82
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.83
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.84
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.85
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.86
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.87
Jul 31 13:24:24 cluster3 openais[3084]: [CLM  ] got nodejoin message 10.128.161.89
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 6
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 9
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 1
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 2
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 3
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 4
Jul 31 13:24:24 cluster3 openais[3084]: [CPG  ] got joinlist message from node 5

Thanks!

Adam

--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

<div>
<div>We seem to have found part of the culprit.</div>
<div>&nbsp;</div>
<div>We're using an Extreme switch that handles all of our traffic in seperate VLAN's, and the IGMP bits of the ExtremeOS seem to be interfering with the clusters ability to recover itself from such an episode.</div>

<div>&nbsp;</div>
<div>At the moment we're leaning towards the Juniper switch as we moved to an identically configured (as far as ports and VLANs go) Juniper EX-4200 and the cluster was able to recover itself with a single node (of nine) being fenced.</div>

<div>&nbsp;</div>
<div>While on the Extreme, each node needed to be fenced in turn for the cluster to be able to recover fully.&nbsp; This means each node being able to mount the GFS mount r/w and actually be able to write and delete test files on the mount point.</div>

<div>&nbsp;</div>
<div>Our testing continues and we're trying to come up with "real" evidence such as proof that some parts of the multicast traffic are or aren't being dealt with properly.&nbsp; So far the empirical evidence suggests the above conclusions.</div>

<div>&nbsp;</div>
<div>-ted<br><br>&nbsp;</div>
<div>
<span class="gmail_quote">On 7/31/08, Brandon Young &lt;<a href="mailto:bkyoung <at> gmail.com">bkyoung <at> gmail.com</a>&gt; wrote:</span>
<blockquote class="gmail_quote">
<div dir="ltr">I have occasionally run into this problem, too.&nbsp; I have found that sometimes I can work around the problem by chkconfig'ing clvmd,cman,and rgmanager off, rebooting, then manually starting cman, rgmanager, clvmd (in that order).&nbsp; Usually, after that, I am able to fence the node(s) and they will rejoin automatically (after re-enabling automatic startup with chkconfig, of course).&nbsp; I know this workaround doesn't explain *why* it happens, but it has more than once helped me get my cluster nodes back online without having to reboot all the nodes. 
<div><span class="e"><br><br><div class="gmail_quote">On Thu, Jul 31, 2008 at 1:42 PM, Mailing List <span dir="ltr">&lt;<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:ml <at> adamdein.com" target="_blank">ml <at> adamdein.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">Hello,<br><br>I currently have a 9 node centos 5.1 cman/gfs cluster which I've managed to break.<br><br>It is broken in almost exactly the same way as stated in these two previous threads:<br><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://www.spinics.net/lists/cluster/msg10304.html" target="_blank">http://www.spinics.net/lists/cluster/msg10304.html</a><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://www.redhat.com/archives/linux-cluster/2008-May/msg00060.html" target="_blank">http://www.redhat.com/archives/linux-cluster/2008-May/msg00060.html</a><br><br>However, I can find no resolution in the archives. My only guaranteed resolution at this point is a cold restart of all nodes which to me seems ridiculous (ie: I'm missing something).<br><br>To add a little details, I have nodes cluster1...9. Nodes 7 &amp; 8 are broken. When I fence/reboot them, cman starts but times out on starting fencing. cman_tools nodes shows them as joined but the fence domain looks broke.<br><br>Any ideas?<br><br>I have included some information for a good node, bad node, and /var/log/messages from a good node that did the fencing.<br><br>Good Node:<br><br>[root <at> cluster1 ~]# cman_tool nodes<br>Node &nbsp;Sts &nbsp; Inc &nbsp; Joined &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Name<br>
&nbsp; 1 &nbsp; M &nbsp; &nbsp;768 &nbsp; 2008-07-31 12:47:19 &nbsp;cluster1-rhc<br>&nbsp; 2 &nbsp; M &nbsp; &nbsp;776 &nbsp; 2008-07-31 12:47:37 &nbsp;cluster2-rhc<br>&nbsp; 3 &nbsp; M &nbsp; &nbsp;772 &nbsp; 2008-07-31 12:47:19 &nbsp;cluster3-rhc<br>&nbsp; 4 &nbsp; M &nbsp; &nbsp;788 &nbsp; 2008-07-31 12:56:20 &nbsp;cluster4-rhc<br>&nbsp; 5 &nbsp; M &nbsp; &nbsp;772 &nbsp; 2008-07-31 12:47:19 &nbsp;cluster5-rhc<br>
&nbsp; 6 &nbsp; M &nbsp; &nbsp;784 &nbsp; 2008-07-31 12:52:50 &nbsp;cluster6-rhc<br>&nbsp; 7 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster7-rhc<br>&nbsp; 8 &nbsp; X &nbsp; &nbsp;800 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cluster8-rhc<br>&nbsp; 9 &nbsp; M &nbsp; &nbsp;772 &nbsp; 2008-07-31 12:47:19 &nbsp;cluster9-rhc<br>[root <at> cluster1 ~]# cman_tool services<br>
type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; level name &nbsp; &nbsp; &nbsp;id &nbsp; &nbsp; &nbsp; state<br>fence &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; default &nbsp; 00010003 FAIL_START_WAIT<br>[1 2 3 4 5 6 9]<br>dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; testgfs1 &nbsp;00020005 none<br>[1 2 3 4 5 6]<br>gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; testgfs1 &nbsp;00010005 none<br>
[1 2 3 4 5 6]<br>[root <at> cluster1 ~]# cman_tool status<br>Version: 6.1.0<br>Config Version: 13<br>Cluster Name: test<br>Cluster Id: 1678<br>Cluster Member: Yes<br>Cluster Generation: 808<br>Membership state: Cluster-Member<br>
Nodes: 8<br>Expected votes: 9<br>Total votes: 8<br>Quorum: 5<br>Active subsystems: 7<br>Flags: Dirty<br>Ports Bound: 0<br>Node name: cluster1-rhc<br>Node ID: 1<br>Multicast addresses: <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://239.192.6.148/" target="_blank">239.192.6.148</a><br>
Node addresses: <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>[root <at> cluster1 ~]# group_tool<br>type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; level name &nbsp; &nbsp; &nbsp;id &nbsp; &nbsp; &nbsp; state<br>
fence &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; default &nbsp; 00010003 FAIL_START_WAIT<br>[1 2 3 4 5 6 9]<br>dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; testgfs1 &nbsp;00020005 none<br>[1 2 3 4 5 6]<br>gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; testgfs1 &nbsp;00010005 none<br>[1 2 3 4 5 6]<br>[root <at> cluster1 ~]#<br><br><br>Bad/broken Node:<br><br>[root <at> cluster7 ~]# cman_tool nodes<br>Node &nbsp;Sts &nbsp; Inc &nbsp; Joined &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Name<br>&nbsp; 1 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster1-rhc<br>&nbsp; 2 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster2-rhc<br>
&nbsp; 3 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster3-rhc<br>&nbsp; 4 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster4-rhc<br>&nbsp; 5 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster5-rhc<br>&nbsp; 6 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster6-rhc<br>&nbsp; 7 &nbsp; M &nbsp; &nbsp;804 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster7-rhc<br>
&nbsp; 8 &nbsp; X &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cluster8-rhc<br>&nbsp; 9 &nbsp; M &nbsp; &nbsp;808 &nbsp; 2008-07-31 13:24:24 &nbsp;cluster9-rhc<br>[root <at> cluster7 ~]# cman_tool services<br>type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; level name &nbsp; &nbsp; id &nbsp; &nbsp; &nbsp; state<br>fence &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; default &nbsp;00000000 JOIN_STOP_WAIT<br>
[1 2 3 4 5 6 7 9]<br>[root <at> cluster7 ~]# cman_tool status<br>Version: 6.1.0<br>Config Version: 13<br>Cluster Name: test<br>Cluster Id: 1678<br>Cluster Member: Yes<br>Cluster Generation: 808<br>Membership state: Cluster-Member<br>
Nodes: 8<br>Expected votes: 9<br>Total votes: 8<br>Quorum: 5<br>Active subsystems: 7<br>Flags: Dirty<br>Ports Bound: 0<br>Node name: cluster7-rhc<br>Node ID: 7<br>Multicast addresses: <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://239.192.6.148/" target="_blank">239.192.6.148</a><br>
Node addresses: <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a><br>[root <at> cluster7 ~]# group_tool<br>type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; level name &nbsp; &nbsp; id &nbsp; &nbsp; &nbsp; state<br>
fence &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; default &nbsp;00000000 JOIN_STOP_WAIT<br>[1 2 3 4 5 6 7 9]<br>[root <at> cluster7 ~]#<br><br><br>/var/log/messages:<br><br>Jul 31 13:20:54 cluster3 fence_node[3813]: Fence of "cluster7-rhc" was successful<br>
Jul 31 13:21:03 cluster3 fence_node[3815]: Fence of "cluster8-rhc" was successful<br>Jul 31 13:21:11 cluster3 openais[3084]: [TOTEM] entering GATHER state from 12.<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering GATHER state from 11.<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Saving state aru 89 high seq received 89<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Storing new sequence id for ring 324<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering COMMIT state.<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering RECOVERY state.<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [0] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a>:<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [1] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a>:<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [2] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a>:<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:21:16 cluster3 kernel: dlm: closing connection to node 7<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>Jul 31 13:21:16 cluster3 kernel: dlm: closing connection to node 8<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [3] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a>:<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [4] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a>:<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [5] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a>:<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [6] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a>:<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 received flag 1<br>Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Did not need to originate any messages in recovery.<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] CLM CONFIGURATION CHANGE<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] New Configuration:<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] Members Left:<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.88/" target="_blank">10.128.161.88</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] Members Joined:<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] CLM CONFIGURATION CHANGE<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] New Configuration:<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a>)<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a>)<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] Members Left:<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] Members Joined:<br>Jul 31 13:21:16 cluster3 openais[3084]: [SYNC ] This node is within the primary component and will provide service.<br>
Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering OPERATIONAL state.<br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a><br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a><br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a><br>Jul 31 13:21:16 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a><br>
Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 2<br>Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 3<br>Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 4<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 5<br>Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 6<br>Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 9<br>
Jul 31 13:21:16 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 1<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering GATHER state from 11.<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Saving state aru 68 high seq received 68<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Storing new sequence id for ring 328<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering COMMIT state.<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering RECOVERY state.<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [0] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a>:<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [1] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a>:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [2] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a>:<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [3] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a>:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [4] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a>:<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [5] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a>:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [6] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a>:<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 9 high delivered 9 received flag 1<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [7] member <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a>:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 received flag 1<br>
Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Did not need to originate any messages in recovery.<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] CLM CONFIGURATION CHANGE<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] New Configuration:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] Members Left:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] Members Joined:<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] CLM CONFIGURATION CHANGE<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] New Configuration:<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a>)<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] Members Left:<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] Members Joined:<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] &nbsp; &nbsp; &nbsp; &nbsp; r(0) ip(<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a>)<br>
Jul 31 13:24:24 cluster3 openais[3084]: [SYNC ] This node is within the primary component and will provide service.<br>Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering OPERATIONAL state.<br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.81/" target="_blank">10.128.161.81</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.82/" target="_blank">10.128.161.82</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.83/" target="_blank">10.128.161.83</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.84/" target="_blank">10.128.161.84</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.85/" target="_blank">10.128.161.85</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.86/" target="_blank">10.128.161.86</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.87/" target="_blank">10.128.161.87</a><br>
Jul 31 13:24:24 cluster3 openais[3084]: [CLM &nbsp;] got nodejoin message <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://10.128.161.89/" target="_blank">10.128.161.89</a><br>Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 6<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 9<br>Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 1<br>Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 2<br>
Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 3<br>Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 4<br>Jul 31 13:24:24 cluster3 openais[3084]: [CPG &nbsp;] got joinlist message from node 5<br><br>Thanks!<br><br>Adam<br><br>--<br>Linux-cluster mailing list<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:Linux-cluster <at> redhat.com" target="_blank">Linux-cluster <at> redhat.com</a><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</blockquote>
</div>
<br></span></div>
</div>
<br>--<br>Linux-cluster mailing list<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</blockquote>
</div>
<br>
</div>
Fabio M. Di Nitto | 1 Aug 2008 11:17
Picon
Favicon

Cluster 2.99.07 (development snapshot) released


The cluster team and its community are proud to announce the 2.99.07 
release from the master branch.

The development cycle for 3.0 is proceeding at a very good speed and 
mostlikely one of the next releases will be 3.0alpha1. All features 
designed for 3.0 are being completed and taking a proper shape, the 
library API has been stable for sometime (and will soon be marked as 3.0 
soname). Stay tuned for upcoming updates!

The 2.99.XX releases are _NOT_ meant to be used for production
environments.. yet.

The master branch is the main development tree that receives all new
features, code, clean up and a whole brand new set of bugs,

At some point in time this code will become the 3.0 stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 2.99 releases and more important report
problems.

In order to build the 2.99.07 release you will need:

openais svn r1579. Porting to corosync is a work in progress.
linux kernel (2.6.26) from
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
(but userland can run on 2.6.25 in compatibility mode)

NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We
are still shipping shared libraries but remember that they can change
anytime without warning. A bunch of new shared libraries have been added.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.07.tar.gz
   https://fedorahosted.org/releases/c/l/cluster/cluster-2.99.07.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch
   https://fedorahosted.org/releases/c/l/cluster/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.99.06):

Andrew Price (1):
       [GFS2] libgfs2: Build with -fPIC

Bob Peterson (14):
       Print log header flags for gfs journals.
       Speed up userspace bitmap manipulation code.
       gfs_fsck crosswrite for block number sanity checking
       Fix some bad references to gfs_tool and gfs_fsck
       Deleted unused function print_map
       Shrink memory 1: eliminate b_size from pseudo-buffer-heads
       Shrink memory 2: get rid of 3 huge in-core bitmaps
       Shrink memory 3: smaller link counts in inode_info
       Better error reporting in gfs2_fsck
       RGRepair: Account for RG blocks inside journals
       gfs2_fsck dupl. blocks between EA and data
       gfs2_edit: Ability to enter "journalX" in block number.
       gfs2_edit: was parsing out gfs1 log descriptors improperly
       gfs2_edit: Improved gfs journal dumps

Christine Caulfield (13):
       [CCS] Set errno when an error occurs.
       [CMAN] Don't use logsys in config modules.
       Revert "[CMAN] Don't use logsys in config modules."
       [CMAN] Don't use logsys in config modules.
       [CCS] Fold ccs_test into ccs_tool and tidy
       [CCS] add -c flag to ccs_tool query
       [CONFIG] Add some more errnos to libccsconfdb
       [CCS] Set return status on failure
       [CCS] Make ccs_tool/ccs_test more consistent
       [CMAN] Fix overridden node names
       [CMAN] pass COROSYNC_ env variables to the daemon
       [CMAN] Display the node's votes in cman_tool status
       qdisk: fix compile error when building without debug.

David Teigland (19):
       gfs_controld: change start message from new members
       gfs_controld: add missing endian conversion
       gfs_controld: byte swap ids earlier
       gfs_controld: close dlm_controld connection
       fenced: improved start messages
       fenced: munge config option code
       fenced: debug logsys options
       dlm_controld: improved start messages
       fenced: complete messages copy start messages
       fenced: munge logging
       dlm_controld: use logsys
       gfs_controld: use logsys
       dlm_controld/gfs_controld: add logging.c file
       groupd: use logsys
       groupd: detect group_mode
       fenced: use group_mode detection
       dlm_controld: use group_mode detection
       gfs_controld: use group_mode detection
       fence_tool: add domain member checks

Fabio M. Di Nitto (42):
       [CCS] Fix LEGACY_CODE ifdef
       [BUILD] Implement --enable_legacy_code in the build system
       [BUILD] Add ccs_test replacement when building legacy_code
       [BUILD] Fix ccs.h include path
       [BUILD] Fix doc install target when building objects outside source tree
       [CCS] Kill obsolted ccs_test
       [RGMANAGER] Port all resource agents to new ccs interface
       [RGMANAGER] Port smb resource agent to ccs_tool
       [BUILD] Fix race condition in oldconfig update/execution
       [RGMANAGER] Use proper ccs_tool query output
       [BUILD] Fix ccs_tool/ccs_test build with new compat code
       [CCS] Inflict hopefully last compat issues love to ccs_t*
       Revert "[RGMANAGER] Use proper ccs_tool query output"
       [RGMANAGER] Port ccs_get to proper ccs_tool output
       [RGMANGER] Fix call to ccs_tool
       [BUILD] Fix ccs_tool linking dir order
       [BUILD] Fix logrotate snippet filename
       [FENCE] Sync fence_apc_snmp from RHEL47 branch
       [BUILD] Fix LOGDIR usage
       [FENCE] Fix fence_apc_snmp logging
       [BUILD] Cleanup linking order for logsys
       [BUILD] Cleanup groupd makefile
       build: update .gitignore
       Revert "fence: port scsi agent to use ccs_tool query and drop XML::LibXML requirement"
       Revert "fence: simplify init script"
       Revert "rgmanger: remove check on cluster.conf from rgmanager init script"
       rgmanger: remove check on cluster.conf from rgmanager init script
       fence: simplify init script
       fence: port scsi agent to use ccs_tool query and drop XML::LibXML requirement
       rgmanager: fix clean target
       cman: init script should not user cluster.conf directly
       rgmanager: init script does not need network config
       config: allow users to override default config file in xmlconfig
       test commit
       Revert "test commit"
       bindings: add first cut of perl Cluster:CCS
       bindings: improve Cluster::CCS description
       build: clean up perl bindings build system
       misc: clean up "char const *" vs "const char *"
       init: standardize init scripts to /etc/sysconfig/cluster
       build: fix bindings build when using external object tree
       bindings: fix CCS.pm doc

Lon Hohberger (2):
       [rgmanager] Add optional save/restore to vm resource
       [qdisk] Make stop_cman="1" work if heuristics fail during initialization

Ryan McCabe (1):
       fence: update apc snmp agent

Ryan O'Hara (3):
       gfs_mkfs: change the way we check to see if a device is mounted
       cman: add option to init script to prevent joining the fence domain
       cman: fix typo (#!/bin/bash) from previous commit

  .gitignore                                       |    7 +
  bindings/perl/Makefile                           |    4 +-
  bindings/perl/ccs/CCS.pm.in                      |  145 +++++
  bindings/perl/ccs/CCS.xs                         |   82 +++
  bindings/perl/ccs/MANIFEST                       |    7 +
  bindings/perl/ccs/META.yml.in                    |   13 +
  bindings/perl/ccs/Makefile.PL                    |   28 +
  bindings/perl/ccs/Makefile.bindings              |   11 +
  bindings/perl/ccs/test.pl                        |   20 +
  bindings/perl/ccs/typemap                        |    1 +
  ccs/ccs_tool/Makefile                            |   35 +-
  ccs/ccs_tool/ccs_tool.c                          |  261 ++++++++-
  ccs/ccs_tool/old_parser.c                        |  688 ----------------------
  ccs/ccs_tool/old_parser.h                        |   64 --
  ccs/ccs_tool/upgrade.c                           |  259 --------
  ccs/ccs_tool/upgrade.h                           |    6 -
  ccs/libccscompat/libccscompat.h                  |    2 +-
  ccs/man/Makefile                                 |    5 +
  ccs/man/ccs_test.8                               |  132 +++++
  cman/cman_tool/cman_tool.h                       |    2 +-
  cman/cman_tool/join.c                            |   19 +-
  cman/cman_tool/main.c                            |    7 +-
  cman/daemon/cman-preconfig.c                     |   35 +-
  cman/init.d/Makefile                             |   16 +-
  cman/init.d/cman                                 |  648 ++++++++++++++++++++
  cman/init.d/cman.in                              |  592 -------------------
  cman/qdisk/main.c                                |    4 +-
  config/libs/libccsconfdb/ccs.h                   |    2 +-
  config/libs/libccsconfdb/libccs.c                |   69 ++-
  config/plugins/ldap/configldap.c                 |   10 +-
  config/plugins/xml/config.c                      |   20 +-
  config/tools/Makefile                            |    2 +-
  config/tools/ccs_test/Makefile                   |   32 -
  config/tools/ccs_test/ccs_test.c                 |  147 -----
  config/tools/man/Makefile                        |    2 +-
  config/tools/man/ccs_test.8                      |  132 -----
  configure                                        |   23 +-
  doc/Makefile                                     |    6 +-
  fence/agents/apc_snmp/fence_apc_snmp.py          |  581 +++++++++++--------
  fence/agents/scsi/fence_scsi.pl                  |   22 +-
  fence/agents/scsi/fence_scsi_test.pl             |   26 +-
  fence/agents/scsi/scsi_reserve                   |   24 +-
  fence/fence_tool/fence_tool.c                    |  260 ++++-----
  fence/fenced/Makefile                            |    6 +-
  fence/fenced/config.c                            |   68 ++-
  fence/fenced/config.h                            |   29 +
  fence/fenced/cpg.c                               |  565 +++++++++++-------
  fence/fenced/fd.h                                |   40 +-
  fence/fenced/group.c                             |   29 +
  fence/fenced/logging.c                           |   42 +-
  fence/fenced/main.c                              |   90 ++--
  fence/fenced/member_cman.c                       |    3 +-
  fence/fenced/recover.c                           |   21 +-
  fence/libfenced/libfenced.h                      |    3 +
  gfs/gfs_mkfs/main.c                              |   29 +-
  gfs2/edit/hexedit.c                              |  290 +++++++---
  gfs2/edit/savemeta.c                             |    9 +-
  gfs2/fsck/eattr.c                                |   21 +-
  gfs2/fsck/eattr.h                                |   20 +-
  gfs2/fsck/fs_recovery.c                          |    4 +-
  gfs2/fsck/fsck.h                                 |    5 +-
  gfs2/fsck/initialize.c                           |   10 +-
  gfs2/fsck/lost_n_found.c                         |    7 +-
  gfs2/fsck/main.c                                 |   35 +-
  gfs2/fsck/metawalk.c                             |  177 ++++--
  gfs2/fsck/metawalk.h                             |   16 +-
  gfs2/fsck/pass1.c                                |  405 +++++++++-----
  gfs2/fsck/pass1b.c                               |   95 ++--
  gfs2/fsck/pass1c.c                               |   69 ++-
  gfs2/fsck/pass2.c                                |   61 ++-
  gfs2/fsck/pass3.c                                |   20 +-
  gfs2/fsck/pass4.c                                |   11 +-
  gfs2/fsck/pass5.c                                |    2 +-
  gfs2/fsck/rgrepair.c                             |   58 ++-
  gfs2/libgfs2/Makefile                            |    1 +
  gfs2/libgfs2/bitmap.c                            |   79 ++-
  gfs2/libgfs2/block_list.c                        |  232 ++++----
  gfs2/libgfs2/buf.c                               |    1 -
  gfs2/libgfs2/fs_bits.c                           |    2 +-
  gfs2/libgfs2/fs_ops.c                            |   38 +-
  gfs2/libgfs2/libgfs2.h                           |   93 ++-
  gfs2/libgfs2/recovery.c                          |    2 +-
  gfs2/libgfs2/rgrp.c                              |    8 +
  group/daemon/Makefile                            |   10 +-
  group/daemon/app.c                               |    3 +
  group/daemon/cpg.c                               |  369 ++++++++++++
  group/daemon/gd_internal.h                       |   51 ++-
  group/daemon/logging.c                           |  170 ++++++
  group/daemon/main.c                              |  177 ++++++-
  group/dlm_controld/Makefile                      |    8 +-
  group/dlm_controld/config.c                      |   39 ++-
  group/dlm_controld/config.h                      |    5 +-
  group/dlm_controld/cpg.c                         |  350 ++++++------
  group/dlm_controld/dlm_daemon.h                  |   34 +-
  group/dlm_controld/group.c                       |   29 +
  group/dlm_controld/logging.c                     |  171 ++++++
  group/dlm_controld/main.c                        |   63 +--
  group/dlm_controld/member_cman.c                 |    3 +-
  group/gfs_controld/Makefile                      |    6 +-
  group/gfs_controld/config.c                      |   59 ++-
  group/gfs_controld/config.h                      |    5 +-
  group/gfs_controld/cpg-new.c                     |  188 ++++---
  group/gfs_controld/gfs_daemon.h                  |   44 ++-
  group/gfs_controld/group.c                       |   29 +
  group/gfs_controld/logging.c                     |  171 ++++++
  group/gfs_controld/main.c                        |   52 ++-
  group/gfs_controld/member_cman.c                 |    1 +
  group/gfs_controld/util.c                        |    1 +
  group/lib/libgroup.c                             |   25 +
  group/lib/libgroup.h                             |    2 +
  make/binding-passthrough.mk                      |    7 +
  make/defines.mk.input                            |    3 +-
  make/fencebuild.mk                               |    1 +
  make/install.mk                                  |    4 +-
  make/perl-binding-common.mk                      |   30 +
  rgmanager/init.d/Makefile                        |   12 +-
  rgmanager/init.d/rgmanager                       |  141 +++++
  rgmanager/init.d/rgmanager.in                    |  154 -----
  rgmanager/src/resources/apache.sh                |   11 +-
  rgmanager/src/resources/mysql.sh                 |   12 +-
  rgmanager/src/resources/named.sh                 |   11 +-
  rgmanager/src/resources/openldap.sh              |   12 +-
  rgmanager/src/resources/postgres-8.sh            |   12 +-
  rgmanager/src/resources/samba.sh                 |   12 +-
  rgmanager/src/resources/smb.sh                   |  104 +---
  rgmanager/src/resources/tomcat-5.sh              |   12 +-
  rgmanager/src/resources/utils/config-utils.sh.in |   66 +--
  rgmanager/src/resources/utils/messages.sh        |    4 -
  rgmanager/src/resources/vm.sh                    |   30 +
  129 files changed, 5659 insertions(+), 4191 deletions(-)

--
I'm going to make him an offer he can't refuse.
Balaji | 1 Aug 2008 12:06

HP ILO Fence Configuration

Dear All,

  Currently i am using HP x6600 Server and I have installed RHEL4 Update 
4 AS Linux and
  RHEL4 Update 4 Support Cluster Suite in my server
  I am new in fence and can any one help me how to configure HP ILO 
fence in my server
  and HP ILO Fence Functionality

Regards
-S.Balaji

Singh Raina, Ajeet | 1 Aug 2008 12:16
Picon
Favicon

Directories gets Deleted during Failover

Hi,

I have been busy setting up Two Node cluster Setup and find that during the failover the directories created under mount point gets deleted.

Please do let me know why it is behaving so?


ajeet


This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
<div>

<p align="LEFT"><span lang="en-us">Hi,</span><span lang="en-us"></span><span lang="en-us"></span></p>

<p align="LEFT"><span lang="en-us">I have been busy setting up Two Node cluster Setup and find that during the failover the directories created under mount point gets deleted.</span></p>

<p align="LEFT"><span lang="en-us">Please do let me know why it is</span><span lang="en-us"></span><span lang="en-us"> behaving</span><span lang="en-us"></span><span lang="en-us"></span><span lang="en-us"></span><span lang="en-us"> so?</span></p>
<br><p align="LEFT"><span lang="en-us">ajeet</span><span lang="en-us"></span><span lang="en-us"></span></p>

<br clear="all"> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

</div>
Fabio M. Di Nitto | 1 Aug 2008 13:08
Picon
Favicon

Cluster 2.03.06 released


The cluster team and its vibrant community are proud to announce the 7th
release from the STABLE2 branch: 2.03.06.

The STABLE2 branch collects, on a daily base, all bug fixes and the bare 
minimal changes required to run the cluster on top of the most recent 
Linux kernel (2.6.26) and rock solid openais (0.80.3).

The 2.03.06 release features porting to the 2.6.26 kernel for the kernel 
modules and userland. Userland can also run in compatibility mode with 
2.6.25 kernel.

NOTE The stable2 branch will not build on top of corosync/openais new tree 
for this release. The very latest code from openais that can be used is 
svn r1579. Porting to corosync will happen in future.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.06.tar.gz
   https://fedorahosted.org/releases/c/l/cluster/cluster-2.03.06.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch
   https://fedorahosted.org/releases/c/l/cluster/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.03.05):

Bob Peterson (15):
       Replace put_inode with drop_inode
       Print log header flags for gfs journals.
       Speed up userspace bitmap manipulation code.
       gfs_fsck crosswrite for block number sanity checking
       Fix some bad references to gfs_tool and gfs_fsck
       Deleted unused function print_map
       Shrink memory 1: eliminate b_size from pseudo-buffer-heads
       Shrink memory 2: get rid of 3 huge in-core bitmaps
       Shrink memory 3: smaller link counts in inode_info
       Better error reporting in gfs2_fsck
       RGRepair: Account for RG blocks inside journals
       gfs2_fsck dupl. blocks between EA and data
       gfs2_edit: Ability to enter "journalX" in block number.
       gfs2_edit: was parsing out gfs1 log descriptors improperly
       gfs2_edit: Improved gfs journal dumps

Christine Caulfield (2):
       [CMAN] Add node votes to 'cman_tool status' output
       cman: revert dirty patch

David Teigland (3):
       gfs_controld: read plocks from dlm or lock_dlm
       fenced: update cman only after complete success
       groupd: ignore nolock gfs

Fabio M. Di Nitto (5):
       [GNBD] Update gnbd to work with 2.6.26
       [GFS] Make gfs build with 2.6.26 (DO NOT USE!)
       [GFS] Fix comment
       [BUILD] Add install/uninstall snippets for documents
       [FENCE] Sync fence_apc_snmp from RHEL47 branch

Lon Hohberger (1):
       [qdisk] Make stop_cman="1" work if heuristics fail during initialization

Ryan McCabe (1):
       fence: update apc snmp agent

Ryan O'Hara (2):
       gfs_mkfs: change the way we check to see if a device is mounted
       cman: add option to init script to prevent joining the fence domain

  cman/cman_tool/main.c                   |    1 +
  cman/daemon/commands.c                  |    3 +-
  cman/init.d/cman.in                     |   93 ++++--
  cman/qdisk/main.c                       |    2 +
  fence/agents/apc_snmp/fence_apc_snmp.py |  581 ++++++++++++++++++-------------
  fence/fenced/agent.c                    |   16 +-
  gfs-kernel/src/gfs/ops_address.c        |    2 +-
  gfs-kernel/src/gfs/ops_super.c          |    7 +-
  gfs-kernel/src/gfs/quota.c              |    4 +-
  gfs/gfs_mkfs/main.c                     |   29 +-
  gfs2/edit/hexedit.c                     |  290 ++++++++++++----
  gfs2/edit/savemeta.c                    |    9 +-
  gfs2/fsck/eattr.c                       |   21 +-
  gfs2/fsck/eattr.h                       |   20 +-
  gfs2/fsck/fs_recovery.c                 |    4 +-
  gfs2/fsck/fsck.h                        |    5 +-
  gfs2/fsck/initialize.c                  |   10 +-
  gfs2/fsck/lost_n_found.c                |    7 +-
  gfs2/fsck/main.c                        |   35 +--
  gfs2/fsck/metawalk.c                    |  177 +++++++----
  gfs2/fsck/metawalk.h                    |   16 +-
  gfs2/fsck/pass1.c                       |  405 ++++++++++++++--------
  gfs2/fsck/pass1b.c                      |   95 +++---
  gfs2/fsck/pass1c.c                      |   69 +++--
  gfs2/fsck/pass2.c                       |   61 ++--
  gfs2/fsck/pass3.c                       |   20 +-
  gfs2/fsck/pass4.c                       |   11 +-
  gfs2/fsck/pass5.c                       |    2 +-
  gfs2/fsck/rgrepair.c                    |   58 +++-
  gfs2/libgfs2/bitmap.c                   |   79 ++++-
  gfs2/libgfs2/block_list.c               |  232 ++++++-------
  gfs2/libgfs2/buf.c                      |    1 -
  gfs2/libgfs2/fs_bits.c                  |    2 +-
  gfs2/libgfs2/fs_ops.c                   |   38 +-
  gfs2/libgfs2/libgfs2.h                  |   93 ++++--
  gfs2/libgfs2/recovery.c                 |    2 +-
  gfs2/libgfs2/rgrp.c                     |    8 +
  gnbd-kernel/src/gnbd.c                  |   91 +++---
  gnbd-kernel/src/gnbd.h                  |    4 +-
  group/daemon/main.c                     |   28 ++-
  group/gfs_controld/lock_dlm.h           |    1 +
  group/gfs_controld/plock.c              |  254 +++++++++++---
  make/install.mk                         |    4 +
  make/uninstall.mk                       |    3 +
  44 files changed, 1841 insertions(+), 1052 deletions(-)

--
I'm going to make him an offer he can't refuse.
Ozgur Akan | 1 Aug 2008 15:33
Picon
Gravatar

network for cluster communication

Hi,

I have two important questions regardin cluster performance.

I attached two ethernet cards as second interfaces on two nodes that I have.

- How can I configure cluster to use this new interface (network) to communicate between eachother.?
 
- Is speed of this local network between two nodes an important criteria for file locks on GFS ?

thanks,
Ozgur Akan
<div><div dir="ltr">Hi,<br><br>I have two important questions regardin cluster performance.<br><br>I attached two ethernet cards as second interfaces on two nodes that I have. <br><br>- How can I configure cluster to use this new interface (network) to communicate between eachother.?<br>
&nbsp;<br>- Is speed of this local network between two nodes an important criteria for file locks on GFS ?<br><br>thanks,<br>Ozgur Akan<br>
</div></div>
Bob Peterson | 1 Aug 2008 15:34
Picon
Favicon

Re: Directories gets Deleted during Failover

Hi Ajeet,

On Fri, 2008-08-01 at 15:46 +0530, Singh Raina, Ajeet wrote:
> Hi,
> 
> I have been busy setting up Two Node cluster Setup and find that
> during the failover the directories created under mount point gets
> deleted.
> 
> Please do let me know why it is behaving so?

You haven't given us enough information.
You haven't even said whether the file system is GFS, GFS2, EXT3,
XFS, etc., or NFS over one of the above.  In general, directories
should not just disappear.  Perhaps one of your nodes has the
file system mounted and the other does not, so when failover
occurs, it just looks like the directories are gone?

Regards,

Bob Peterson
Red Hat Clustering & GFS

Christine Caulfield | 1 Aug 2008 15:38
Picon
Favicon
Gravatar

Re: network for cluster communication

Ozgur Akan wrote:
> Hi,
> 
> I have two important questions regardin cluster performance.
> 
> I attached two ethernet cards as second interfaces on two nodes that I 
> have.
> 
> - How can I configure cluster to use this new interface (network) to 
> communicate between eachother.?

Put the host name or IP address of the new interface in cluster.conf, in 
place of the existing host names.

> - Is speed of this local network between two nodes an important criteria 
> for file locks on GFS ?
> 

Yes, very :)

Chrissie

Lon Hohberger | 1 Aug 2008 21:34
Picon
Favicon

Re: "Inc" column description/semnification

On Thu, 2008-07-31 at 10:04 +0300, Alex wrote:
> On Wednesday 30 July 2008 20:36, Lon Hohberger wrote:
> > On Wed, 2008-07-30 at 14:52 +0300, Alex wrote:
> > > Hello,
> > >
> > > What does it mean "Inc" column in the output of the cman_tool nodes
> > > command?
> > >
> > > [root <at> rs2 ~]# cman_tool nodes
> > > Node  Sts   Inc   Joined               Name
> > >    1   M      8   2008-07-30 11:03:12  192.168.113.5
> > >    2   M      4   2008-07-30 10:59:34  192.168.113.4
> > > [root <at> rs2 ~]#
> > >
> > > Can anybody tell me what represent 4 and 8 in Inc coulmn?
> >
> > Local incarnation # for the node, if I recall correctly.  They usually
> > do not match cluster-wide.
> 
> Because we know what is its name, let me ask you about Inc signification, how 
> can be interpreted and what represent 8 and 4 in above column... 8m, 8pps, 
> 8kbps, 8kv, womans, mans, aliens? In manual and documentation is absolutely 
> missing any info about Inc column!

I'm pretty sure it's the Totem protocol sequence # the local node
recorded for when it first "saw" the node.  The "Joined" time is the
same thing, except it's according to the local node's clock instead of
the Totem token sequence #.

That's all they are.  They don't indicate anything useful for
monitoring.

> And another question: why numbers in Inc column is changing everytime a node 
> is rebooted and remain constant till next reboot?

The sequence # is different the next time the node is "seen".  You'll
also notice the "Joined" value is different.

The "Inc" column and "Joined" column are set at the same time but are
not related to each other value-wise.

-- Lon

Lon Hohberger | 1 Aug 2008 21:38
Picon
Favicon

Re: how to mount a gfs2 volume on all our real webservers in /var/www/html

On Thu, 2008-07-31 at 13:22 +0300, Alex wrote:
> On Thursday 31 July 2008 11:52, 张会光 wrote:
> > This is a typical LVS model.
> 
> Indeed is a LVS. I have an router in front of rs1, rs2, rs3 webservers which 
> is configured as LVS with load balancing.
> 
> > Do not add your httpd script and mount script into source in your
> > cluster.conf
> 
> In redhat howto "Example of Setting Up Apache HTTP Server" they are saying to 
> not start httpd server at boot time and leave the cluster to do that! Thats 
> why i added http_service in my cluster.conf.

It's a different use case than what you want.  The one in the
documentation you were reading is referring to failover of a single
instance of httpd, not running httpd on 3 nodes at the same time.

* put your gfs2 volumes in /etc/fstab
* turn on httpd

-- Lon


Gmane