C. Handel | 23 Jul 17:53 2014

corosync ring failure

hi,

i run a cluster with two corosync rings. One of the rings is marked
faulty every fourty seconds, to immediately recover a second later.
the other ring is stable

i have no idea how i should debug this.

we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
cluster consists of three machines. Ring1 is running on 10gigbit
interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
respective switch.

corosync communication is udpu, rrp_mode is passive

cluster.conf:

<cluster config_version="30" name="aslfile">

<cman transport="udpu">
</cman>

<fence_daemon post_join_delay="120" post_fail_delay="30"/>

<fencedevices>
        <fencedevice name="pcmk" agent="fence_pcmk" action="off"/>
</fencedevices>

<quorumd
   cman_label="qdisk"
(Continue reading)

Devin A. Bougie | 15 Jul 17:36 2014
Picon

mixed 6.4 and 6.5 cluster - delays accessing mpath devices and clustered lvm's

We have a cluster of EL6.4 servers, with one server at fully updated EL6.5.  After upgrading to 6.5, we see
unreasonably long delays accessing some mpath devices and clustered lvm's on the 6.5 member.  There are no
problems with the 6.4 members.

This can be seen by strace'ing lvscan.  In the following example, syscall time is at the end of the line,
reads with ascii text are mpath devices, the rest are volumes:

------
16241 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
4096 <1.467385>
16241 read(5, "\17u\21^ LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 4096) = 4096 <1.760943>
16241 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
4096 <1.164032>
16241 read(5, "gment1 {\nstart_extent = 0\nextent"..., 4096) = 4096 <2.859972>
16241 read(5,
"\353H\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"...,
4096) = 4096 <1.717222>
16241 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
4096 <1.476014>
16241 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
4096 <1.800225>
16241 read(5,
"3\300\216\320\274\0|\216\300\216\330\276\0|\277\0\6\271\0\2\374\363\244Ph\34\6\313\373\271\4\0"...,
4096) = 4096 <2.008620>
16241 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
4096 <2.021734>
16241 read(5,
"3\300\216\320\274\0|\216\300\216\330\276\0|\277\0\6\271\0\2\374\363\244Ph\34\6\313\373\271\4\0"...,
4096) = 4096 <2.126359>
16241 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
(Continue reading)

abdul mujeeb Siddiqui | 15 Jul 16:18 2014
Picon

Basename mismatch

Hello, I have to implemented  red hat linux 6.4 cluster suite and trying to use Oracle11gr2 on it.But oracle service is unable to start.
Listener isnot starting.
Anyone have implemented oracle11gr2 so please
Send me cluster.conf and oracledb.sh and also listener.ora and tnsnames.ora files pls.
Thanks in advanced

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Laszlo Budai | 10 Jul 13:49 2014
Picon

Cman not start when quorum disk is not available

Dear all,

we have a RHEL 6.3 cluster of two nodes and a quorum disk.
We are testing the cluster against different failures. We have a problem 
when the shared storage is disconnected from one of the nodes. The node 
that has lost contact with the storage is fenced, but when restarting 
the machine cman will not start up (it will try to start but it will stop):

Jul  9 17:55:54 clnode1p kdump: started up
Jul  9 17:55:54 clnode1p kernel: bond0: no IPv6 routers present
Jul  9 17:55:54 clnode1p kernel: DLM (built Jun 13 2012 18:26:45) installed
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Corosync Cluster 
Engine ('1.4.1'): started and ready to provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Corosync built-in 
features: nss dbus rdma snmp
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Successfully read 
config from /etc/cluster/cluster.conf
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Successfully parsed 
cman config
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] Initializing 
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] The network 
interface [172.16.255.1] is now up.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Using quorum 
provider quorum_cman
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1
Jul  9 17:55:55 clnode1p corosync[2514]:   [CMAN  ] CMAN 3.0.12.1 (built 
May  8 2012 12:22:26) started
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync CMAN membership service 2.90
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: openais checkpoint service B.01.01
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync extended virtual synchrony service
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync configuration service
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster closed process group service v1.01
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster config database access v1.01
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync profile loading service
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Using quorum 
provider quorum_cman
Jul  9 17:55:55 clnode1p corosync[2514]:   [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Compatibility mode 
set to whitetank.  Using V1 and V2 of the synchronization engine.
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] A processor joined 
or left the membership and a new membership was formed.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[1]: 1
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[1]: 1
Jul  9 17:55:55 clnode1p corosync[2514]:   [CPG   ] chosen downlist: 
sender r(0) ip(172.16.255.1) ; members(old:0 left:0)
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [TOTEM ] A processor joined 
or left the membership and a new membership was formed.
Jul  9 17:55:55 clnode1p corosync[2514]:   [CMAN  ] quorum regained, 
resuming activity
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] This node is within 
the primary component and will provide service.
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[2]: 1 2
Jul  9 17:55:55 clnode1p corosync[2514]:   [QUORUM] Members[2]: 1 2
Jul  9 17:55:55 clnode1p corosync[2514]:   [CPG   ] chosen downlist: 
sender r(0) ip(172.16.255.1) ; members(old:1 left:0)
Jul  9 17:55:55 clnode1p corosync[2514]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Jul  9 17:55:59 clnode1p kernel: bond1: no IPv6 routers present
Jul  9 17:55:59 clnode1p qdiskd[2564]: Loading dynamic configuration
Jul  9 17:55:59 clnode1p qdiskd[2564]: Setting votes to 1
Jul  9 17:55:59 clnode1p qdiskd[2564]: Loading static configuration
Jul  9 17:55:59 clnode1p qdiskd[2564]: Timings: 8 tko, 1 interval
Jul  9 17:55:59 clnode1p qdiskd[2564]: Timings: 2 tko_up, 4 master_wait, 
2 upgrade_wait
Jul  9 17:55:59 clnode1p qdiskd[2564]: Heuristic: '/bin/ping -c1 -w1 
clswitch1m' score=1 interval=2 tko=4
Jul  9 17:55:59 clnode1p qdiskd[2564]: Heuristic: '/bin/ping -c1 -w1 
clswitch2m' score=1 interval=2 tko=4
Jul  9 17:55:59 clnode1p qdiskd[2564]: 2 heuristics loaded
Jul  9 17:55:59 clnode1p qdiskd[2564]: Quorum Daemon: 2 heuristics, 1 
interval, 8 tko, 1 votes
Jul  9 17:55:59 clnode1p qdiskd[2564]: Run Flags: 00000271
Jul  9 17:55:59 clnode1p qdiskd[2564]: stat
Jul  9 17:55:59 clnode1p qdiskd[2564]: qdisk_validate: No such file or 
directory
Jul  9 17:55:59 clnode1p qdiskd[2564]: Specified partition 
/dev/mapper/apsto1-vd01-v001 does not have a qdisk label
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Unloading all 
Corosync service engines.
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync extended virtual synchrony service
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync configuration service
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync cluster closed process group service v1.01
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync cluster config database access v1.01
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync profile loading service
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: openais checkpoint service B.01.01
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync CMAN membership service 2.90
Jul  9 17:56:01 clnode1p corosync[2514]:   [SERV  ] Service engine 
unloaded: corosync cluster quorum service v0.1
Jul  9 17:56:01 clnode1p corosync[2514]:   [MAIN  ] Corosync Cluster 
Engine exiting with status 0 at main.c:1864.

And it will remain in this state even if the storage is reattached later 
on.  So now I have only one functioning node.
What can be done to fix this (to have the cluster framework started)?

Thank you,
Laszlo

-- 
Acceleris System Integration | and IT works

Laszlo Budai | Technical Consultant
Bvd. Barbu Vacarescu 80 | RO-020282 Bucuresti
t +40 21 23 11 538
laszlo.budai <at> acceleris.ro | www.acceleris.ro

Acceleris Offices are in:
Basel | Bucharest | Zollikofen | Renens | Kloten

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Amjad Syed | 1 Jul 10:27 2014
Picon

Virtual IP service

Hello

I am trying to start a virtual ip service on my 2 node cluster.

Here are the details of the network setting and configuration.

1 : Bond(heartbeat) . This is a private network with no switch involved. Not available to public
  node1: 192.168.10.11
  node2  : 192.168.10.10 

2. Fencing (Ilo) .This one goes through a switch
   node 1: 10.10.63.92 
   node2 :  10.10.63.93

3) Public ip addresses
   10.10.5.100 : node1
   10.10.5.20    node2 .

I  have set Virtual IP  as  10.10.5.23 in cluster.conf
  <service autostart="1" exclusive="0" name="IP" recovery="relocate">
                <ip address="10.10.5.23" monitor_link="on" sleeptime="10"/>

However, this Virtual IP does not work since the cman communication is on 192 network. When i try to set cman to 10.10.5.X network, the nodes go into fence loop, i,e they fence each other

So i am asking, is there a "network-preference option" etc in cluster.conf that can map virtual IP to private network addresses.

Thank you
--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Eivind Olsen | 30 Jun 01:48 2014
Picon

Which fence agents to use?

Hello.

I am currently planning a 2-node cluster based on RHEL 6.5 and the high availability addon, with the goal of
running Oracle 11g in active/passive failover mode.
The cluster nodes will be physical HP blades, and they will have shared storage for the Oracle data-files on
a FC-SAN. That is, shared block device but using HA LVM so only mounting the filesystem on one node at a time.

The way I see it, my fence options are fence_ipmilan but I could also look at fence_scsi. Should I use only one
or both of these? If both: in what order?

Regards
Eivind Olsen

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Amjad Syed | 24 Jun 12:32 2014
Picon

Error in Cluster.conf

Hello

I am getting the following error when i run ccs_config_Validate

ccs_config_validate
Relax-NG validity error : Extra element clusternodes in interleave
tempfile:12: element clusternodes: Relax-NG validity error : Element cluster failed to validate content
Configuration fails to validate

Here is my cluster.conf file

<?xml version="1.0"?>
<cluster config_version="1" name="oracleha">
        <clusternodes>
                <clusternode name="krplporcl001" nodeid="1"/>
                <clusternode name="krplporcl002" nodeid="2"/>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>

        <fencedevices>
           <fencedevice agent= "fence_ipmilan" ipaddr="10.10.63.93" login="ADMIN" name="inspuripmi" passwd="abc123"/>
           <fencedvice agent = "fence_ilo" ipaddr="10.10.63.92" login="test" name="hpipmi" passwd="abc12345"/>
          </fencedevices>
        <clusternodes>
           <clusternode name= "krplporcl001"  nodeid="1" votes= "1">
           <fence>
               <method name  = "1">
                 <device lanplus = "" name="fence_node1"  action ="reboot"/>
                 </method>
            </fence>
           </clusternode>
            <clusternode name = "krplporcl002" nodeid="2" votes ="1">
                 <fence>
                 <method name = "1">
                 <device lanplus ="1" name="fence_node2" action ="reboot"/>
                  </method>
               </fence>
            </clusternode>
         </clusternodes>


        <rm>

          <failoverdomains/>
        <resources/>
        <service autostart="1" exclusive="0" name="IP" recovery="relocate">
                <ip address="10.10.5.23" monitor_link="on" sleeptime="10"/>
        </service>
</rm>
</cluster>


Any help would be appreciated


--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Vasil Valchev | 23 Jun 20:09 2014
Picon

Online change of fence device options - possible?

Hello,

I have a RHEL 6.5 cluster, using rgmanager.
The fence devices are fence_ipmilan - fencing through HP iLO4.

The issue is the fence devices weren't configured entirely correct - recently after a node failure, the fence agent was returning failures (even though it was fencing the node successfully), which apparently can be avoided by setting the power_wait option to the fence dev configuration.

My question is - after changing the fence device (I think directly through the .conf will be fine?), iterating the config version, and syncing the .conf through the cluster software - is something else necessary to apply the change (eg. cman reload)?

Will the new fence option be used the next time a fencing action is performed?

And lastly can all of this be performed while the cluster and services are operational or they have to be stopped/restarted?


Regards,
Vasil

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Amjad Syed | 22 Jun 09:55 2014
Picon

fence Agent

Hello,

I am trying to setup a simple 2 node cluster in active/passive mode for oracle high availability

We are using one  INSPUR server and one HP proliant (Management decision based on  hardware availability)   and we are seeing if we can use IPMI as fencing method

CCHS though supports HP ILO, DELL IPMI, IBM , but not  INSPUR.

So the basic question i have is what if we can use fence_ILO (for HP) and fence_ipmilan (For INSPUR)?

IF any one have any experience with fence_ipmilan or point to resources , it would really be appreciated.

Sincerely,
Amjad
--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Masatake YAMATO | 20 Jun 04:07 2014
Picon

Fw: [corosync] wireshark dissector for corosync 1.x srp

If you have a trouble in lower layer communication in cluster 3,
wireshark can help you understand it.

Masatake YAMATO
Picon
From: Masatake YAMATO <yamato <at> redhat.com>
Subject: [corosync] wireshark dissector for corosync 1.x srp
Date: 2014-06-20 02:03:36 GMT
Finally wireshark dissector for corosync 1.x srp is merged to the
wireshark offcial source tree. From the next version of wireshark
without patch can dissect the traffic of lower layer of your cluster!

https://code.wireshark.org/review/#/c/725/
https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-corosync-totemnet.c
https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-corosync-totemsrp.c

Here is old document for how to let wireshark know the decryption key of corosync:

    https://github.com/masatake/wireshark-plugin-rhcs
    https://github.com/masatake/wireshark-plugin-rhcs/blob/master/screenshots/corosync_totemnet__pref.png

I'll continue to work on upper layers and corosync 2.

It takes more than 7 years for merging.  During the period I got a son
and he becomes a schoolboy now. Thank you for those who gave me
advices about the protocols.

Masatake YAMATO
_______________________________________________
discuss mailing list
discuss <at> corosync.org
http://lists.corosync.org/mailman/listinfo/discuss
--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Andreas Haralambopoulos | 19 Jun 16:08 2014
Picon

Openvpn as a service in RGManager

Is it possible to tun in rgmanager a VPN service only in one node?

something like this in pacemaker

primitive p_openvpn ocf:heartbeat:anything \
        params binfile="/usr/sbin/openvpn" cmdline_options="--daemon --writepid /var/run/openvpn.pid
--config /data/openvpn/server.conf --cd /data/openvpn" pidfile="/var/run/openvpn.pid" \
        op start timeout="20" \
        op stop timeout="30" \
        op monitor interval="20" \
        meta target-role="Started"

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Gmane