Manuel Galán (UJA | 18 Apr 19:44 2014
Picon

Re: Linux-cluster Digest, Vol 120, Issue 5

Hello all, 
    What about ovirt? visit ovirt.org

Good weekend...


Enviado de Samsung Mobile



-------- Mensaje original --------
De: linux-cluster-request <at> redhat.com
Fecha: 18/04/2014 18:00 (GMT+01:00)
Para: linux-cluster <at> redhat.com
Asunto: Linux-cluster Digest, Vol 120, Issue 5


Send Linux-cluster mailing list submissions to
linux-cluster <at> redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
linux-cluster-request <at> redhat.com

You can reach the person managing the list at
linux-cluster-owner <at> redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. Re: KVM availability groups (Pavel Herrmann)


----------------------------------------------------------------------

Message: 1
Date: Thu, 17 Apr 2014 21:16:22 +0200
From: Pavel Herrmann <morpheus.ibis <at> gmail.com>
To: linux-cluster <at> redhat.com
Cc: "Henley, David \(Solutions Architect Chicago\)"
<david.l.henley <at> hp.com>
Subject: Re: [Linux-cluster] KVM availability groups
Message-ID: <2179081.39Bc0pasea <at> bloomfield>
Content-Type: text/plain; charset="us-ascii"

Hi,

I am not an expert in this, but as far as i understand it works like this

On Thursday 17 of April 2014 13:20:11 Henley, David wrote:
> I have 8 to 10 Rack mount Servers running Red Hat KVM.
> I need to create 2 availability zones and a backup zone.
>
>
> 1.       What tools do you use to create these? Is it always scripted or is
> there an open source interface similar to say Vcenter.

There are vcenter-like interfaces, but I'm not sure how they handle HA, have a
look at ganeti and/or openstack

this list is rather more concerned about the low level workings of clustered
systems, with tools such as cman or pacemaker (depending on your OS version, I
think all current RHEL versions use cman) to monitor and manage availability
of your services (a VM is a service in this context), and corosync to keep
your cluster in a consistent state.

if you are looking for a vsphere replacement, you might have better luck with
openstack than tinkering with linux clustering directly, in my opinion.


> 2.       Are there KVM tools that monitor the zones?

You would probably use libvirt interface to manipulate with your KVM instances

regards,
Pavel Herrmann



------------------------------

--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 120, Issue 5
*********************************************
--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Picon

KVM availability groups

I have 8 to 10 Rack mount Servers running Red Hat KVM.

I need to create 2 availability zones and a backup zone.

 

1.       What tools do you use to create these? Is it always scripted or is there an open source interface similar to say Vcenter.

2.       Are there KVM tools that monitor the zones?

 

Thanks Dave

 

David Henley

Solutions Architect

Hewlett-Packard Company

+1 815 341 2463

dhenley <at> hp.com

 

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Digimer | 15 Apr 23:03 2014
Picon

KVM Live migration when node's FS is read-only

Hi all,

   So I hit a weird issue last week... (EL6 + cman + rgamanager + drbd)

   For reasons unknown, a client thought they could start yanking and 
replacing hard drives on a running node. Obviously, that did not end 
well. The VMs that had been running on the node continues to operate 
fine and they just started using the peer's storage.

   The problem came when I tried to live-migrate the VMs over to the 
still-good node. Obviously, the old host couldn't write to logs, and the 
live-migration failed. Once failed, rgmanager also stopped working once 
the migration failed. In the end, I had to manually fence the node 
(corosync never failed, so it didn't get automatically fenced).

   This obviously caused the VMs running on the node to reboot, causing 
a ~40 second outage. It strikes me that the system *should* have been 
able to migrate, had it not tried to write to the logs.

   Is there a way, or can there be made a way, to migrate VMs off of a 
node whose underlying FS is read-only/corrupt/destroyed, so long as the 
programs in memory are still working?

   I am sure this is part a part rgmanager, part KVM/qemu question.

Thanks for any feedback!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Bjoern Teipel | 7 Apr 09:26 2014

DLM nodes disconnected issue

H all,

i did a dlm_tool leave clvmd on one node (node06) of a CMAN cluster with CLVMD
Now I have the problem that clvmd is stuck and all nodes lost
connections to DLM.
For some reason dlm want's to fence member 8 I guess and that might
stuck the whole dlm?
All other stacks, cman, corosync look fine...

Thanks,
Bjoern

Error:

dlm: closing connection to node 2
dlm: closing connection to node 3
dlm: closing connection to node 4
dlm: closing connection to node 5
dlm: closing connection to node 6
dlm: closing connection to node 8
dlm: closing connection to node 9
dlm: closing connection to node 10
dlm: closing connection to node 2
dlm: closing connection to node 3
dlm: closing connection to node 4
dlm: closing connection to node 5
dlm: closing connection to node 6
dlm: closing connection to node 8
dlm: closing connection to node 9
dlm: closing connection to node 10
INFO: task dlm_tool:33699 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dlm_tool      D 0000000000000003     0 33699  33698 0x00000080
 ffff88138905dcc0 0000000000000082 ffffffff81168043 ffff88138905dd18
 ffff88138905dd08 ffff88305b30ccc0 ffff88304fa5c800 ffff883058e49900
 ffff881857329058 ffff88138905dfd8 000000000000fb88 ffff881857329058
Call Trace:
 [<ffffffff81168043>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
 [<ffffffff8132f79a>] ? misc_open+0x1ca/0x320
 [<ffffffff81510725>] rwsem_down_failed_common+0x95/0x1d0
 [<ffffffff81185505>] ? chrdev_open+0x125/0x230
 [<ffffffff815108b6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff8117e5ff>] ? __dentry_open+0x23f/0x360
 [<ffffffff81283894>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff8150fdb4>] ? down_read+0x24/0x30
 [<ffffffffa06d948d>] dlm_clear_proc_locks+0x3d/0x2a0 [dlm]
 [<ffffffff811dfed6>] ? generic_acl_chmod+0x46/0xd0
 [<ffffffffa06e4b36>] device_close+0x66/0xc0 [dlm]
 [<ffffffff81182b45>] __fput+0xf5/0x210
 [<ffffffff81182c85>] fput+0x25/0x30
 [<ffffffff8117e0dd>] filp_close+0x5d/0x90
 [<ffffffff8117e1b5>] sys_close+0xa5/0x100
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Status:

cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  18908   2014-03-24 19:01:00  node01
   2   M  18972   2014-04-06 22:47:57  node02
   3   M  18972   2014-04-06 22:47:57  node03
   4   M  18972   2014-04-06 22:47:57  node04
   5   M  18972   2014-04-06 22:47:57  node05
   6   X  18960                        node06
   7   X  18928                        node07
   8   M  18972   2014-04-06 22:47:57  node08
   9   M  18972   2014-04-06 22:47:57  node09
  10   M  18972   2014-04-06 22:47:57  node10

dlm lockspaces
name          clvmd
id            0x4104eefa
flags         0x00000004 kern_stop
change        member 8 joined 0 remove 1 failed 0 seq 11,11
members       1 2 3 4 5 8 9 10
new change    member 8 joined 1 remove 0 failed 0 seq 12,41
new status    wait_messages 0 wait_condition 1 fencing
new members   1 2 3 4 5 8 9 10

DLM dump:
1396849677 cluster node 2 added seq 18972
1396849677 set_configfs_node 2 10.14.18.66 local 0
1396849677 cluster node 3 added seq 18972
1396849677 set_configfs_node 3 10.14.18.67 local 0
1396849677 cluster node 4 added seq 18972
1396849677 set_configfs_node 4 10.14.18.68 local 0
1396849677 cluster node 5 added seq 18972
1396849677 set_configfs_node 5 10.14.18.70 local 0
1396849677 cluster node 8 added seq 18972
1396849677 set_configfs_node 8 10.14.18.80 local 0
1396849677 cluster node 9 added seq 18972
1396849677 set_configfs_node 9 10.14.18.81 local 0
1396849677 cluster node 10 added seq 18972
1396849677 set_configfs_node 10 10.14.18.77 local 0
1396849677 dlm:ls:clvmd conf 2 1 0 memb 1 3 join 3 left
1396849677 clvmd add_change cg 35 joined nodeid 3
1396849677 clvmd add_change cg 35 counts member 2 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 3 1 0 memb 1 2 3 join 2 left
1396849677 clvmd add_change cg 36 joined nodeid 2
1396849677 clvmd add_change cg 36 counts member 3 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 4 1 0 memb 1 2 3 9 join 9 left
1396849677 clvmd add_change cg 37 joined nodeid 9
1396849677 clvmd add_change cg 37 counts member 4 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 5 1 0 memb 1 2 3 8 9 join 8 left
1396849677 clvmd add_change cg 38 joined nodeid 8
1396849677 clvmd add_change cg 38 counts member 5 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
1396849677 clvmd add_change cg 39 joined nodeid 10
1396849677 clvmd add_change cg 39 counts member 6 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
1396849677 clvmd add_change cg 40 joined nodeid 5
1396849677 clvmd add_change cg 40 counts member 7 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left
1396849677 clvmd add_change cg 41 joined nodeid 4
1396849677 clvmd add_change cg 41 counts member 8 joined 1 remove 0 failed 0
1396849677 dlm:controld conf 2 1 0 memb 1 3 join 3 left
1396849677 dlm:controld conf 3 1 0 memb 1 2 3 join 2 left
1396849677 dlm:controld conf 4 1 0 memb 1 2 3 9 join 9 left
1396849677 dlm:controld conf 5 1 0 memb 1 2 3 8 9 join 8 left
1396849677 dlm:controld conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
1396849677 dlm:controld conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
1396849677 dlm:controld conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Vallevand, Mark K | 3 Apr 22:58 2014
Picon

Simple data replication in a cluster

I’m looking for a simple way to replicate data within a cluster.

 

It looks like my resources will be self-configuring and may need to push changes they see to all nodes in the cluster.  The idea being that when a node crashes, the resource will have its configuration present on the node on which it is restarted.  We’re talking about a few kb of data, probably in one file, probably text.  A typical cluster would have multiple resources (more than two), one resource per node and one extra node.

 

Ideas?

 

Could I use the CIB directly to replicate data?  Use cibadmin to update something and sync?

How big can a resource parameter be?  Could a resource modify its parameters so that they are replicated throughout the cluster?

Is there a simple file replication Resource Agent?

Drdb seems like overkill.

 

Regards.
Mark K Vallevand   Mark.Vallevand <at> Unisys.com

May you live in interesting times, may you come to the attention of important people and may all your wishes come true.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

 

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hamid Jafarian | 30 Mar 16:34 2014

GFS2 unformat helper tool

Hi,

We developed GFS2 volume unformat helper tool.
Read about this code at:
http://pdnsoft.com/en/web/pdnen/blog/-/blogs/gfs2-unformat-helper-tool-1

Regards

-- 
Hamid Jafarian
CEO at PDNSoft Co.
Web site: http://www.pdnsoft.com
Blog:     http://jafarian.pdnsoft.com

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

bergman | 28 Mar 17:37 2014

mixing OS versions?


I've got a 3-node cluster under CentOS5.

I'd like to add 3 additional nodes, running CentOS6.

Are there any known issues, guidelines, or recommendations for having
a single RHCS cluster with different OS releases on the nodes?

Thanks,

Mark

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Eduar Arley | 28 Mar 07:06 2014
Picon

IP address for replies

Hello everyone.

I have an HA Cluster in CentOS 6, using CMAN + RGManager (the
supported and 'official' stack in CentOS 6.5). Everything works OK;
however, there is a thing that could give me problems in the future,
so I would like thinking right now how to solve it.

When an incoming packet comes to my cluster (through a floating IP
address), mi active node receives it OK; however, it replies from his
'real' IP address, not from the floating IP. As i'm deploying SIP in
this cluster, maybe some provider in the future could dislike this IP
and reject my calls.

I've read heartbeat have some functionality to fix this, called
IPSrcAddr; however I don't see a similar resource in Conga Web
Interface or in Red Hat documentation. In other websites, I've read
about a workaround involving IP routing rules tables, but I don't
think this is an optimal solution.

Anybody knows a way to fix this in my scenario?

Thanks!

Eduar Cardona

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Vallevand, Mark K | 19 Mar 20:55 2014
Picon

Resource instance is getting restarted when a node is rebooted

I’m testing my cluster configuration by rebooting nodes to see what happens.  I can’t explain what I see in some cases.

 

The setup:  I have a cloned resource with its own agent and an IP address resource that is collocated with the cloned resource.  The IP address doesn’t need to run on all of the nodes running an instance of the cloned resource.  It just needs to be on one of the nodes.  It’s not cloned or meant to be load-balanced.

 

I do something like this:

crm -F configure <<EOF

                                        primitive IP ocf:heartbeat:IPaddr2 \

                                                params ip=10.1.1.1 nic=eth0 cidr_netmask=24 \

                                                op monitor interval=30s timeout=20s \

                                                op start timeout=30s \

                                                op stop timeout=30s

                                        primitive P ocf:heartbeat:my_agent \

                                                op monitor interval=30s timeout=10s \

                                                op start timeout=30s \

                                                op stop timeout=30s

                                        clone P_clone P \

                                                meta clone-max=2 notify="true" clone-node-max=1

                                        colocation P_withIP INFINITY: IP P_clone

                                        order P_AfterIP INFINITY: IP P_clone

                                        commit

                                        exit

EOF

 

This works great.  In my 2 node system, node1 has IP and P:0 on it, node2 has P:1 on it, and node3 has nothing on it.

Reboot node2.  I see P:1 start on node3.  Good.

Reboot node3.  I see P:1 start on node2.  Good.

Reboot node1.  I see P:0 and IP start on node3.  Good.  And I see P:1 restart on node2. 

What’s up with that?

Have I done my collocation incorrectly?  If I reboot the node that has the IP resource on it, all instances of P_clone move or restart.

 

Any ideas are very welcome.

 

Regards.
Mark K Vallevand   Mark.Vallevand <at> Unisys.com

May you live in interesting times, may you come to the attention of important people and may all your wishes come true.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

 

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Marek Grac | 19 Mar 15:37 2014
Picon

Re: joining the gitfence-agents group

On 03/18/2014 12:20 AM, David Smith wrote:
> any chance you can sponsor and approve so i can submit the code via git?
>
> or, if you prefer, I can send you the code modifications.
>
Hi,

sorry for late response,

The write access to git repository is still limited to very small group 
of people and I will be happy to add you there after you become regular 
contributor. Currently, please send a patch to cluster-devel <at> redhat.com 
where code review will be done. After that review, we will add your code 
into upstream using git-am so you will be preserved as author.

m,

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Digimer | 19 Mar 02:27 2014
Picon

Adding a stop timeout to a VM service using 'ccs'

Hi all,

   I would like to tell rgmanager to give more time for VMs to stop. I 
want this:

<vm name="vm01-win2008" domain="primary_n01" autostart="0" 
path="/shared/definitions/" exclusive="0" recovery="restart" 
max_restarts="2" restart_expire_time="600">
   <action name="stop" timeout="10m" />
</vm>

I already use ccs to create the entry:

<vm name="vm01-win2008" domain="primary_n01" autostart="0" 
path="/shared/definitions/" exclusive="0" recovery="restart" 
max_restarts="2" restart_expire_time="600"/>

via:

ccs -h localhost --activate --sync --password "secret" \
  --addvm vm01-win2008 \
  --domain="primary_n01" \
  path="/shared/definitions/" \
  autostart="0" \
  exclusive="0" \
  recovery="restart" \
  max_restarts="2" \
  restart_expire_time="600"

I'm hoping it's a simple additional switch. :)

Thanks!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

--

-- 
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Gmane