Denny Schierz | 18 Feb 2011 10:08

Build failover ZFS, like HA-Storage from Solaris

hi,

we're searching for an alternative failover solution with ZFS. We have
two nodes connected to _one_ SAS Storage (so, no DRBD or anything
possible) and we want, to export zfs volumes via ISCSI to other systems.

If the primary node fails, the ZFS Pool (which also does 8 times raidz2)
has to be move to the secondary node. If that is done, the global IP
(carp?) switch to the new node.

It works with HA-Storage from Solaris 10, but the license are too
expensive on none sun-hardware :-/ for our university.

Any solutions?
Freddie Cash | 18 Feb 2011 21:29
Picon
Gravatar

Re: Build failover ZFS, like HA-Storage from Solaris

On Fri, Feb 18, 2011 at 1:08 AM, Denny Schierz <linuxmail <at> 4lin.net> wrote:
> we're searching for an alternative failover solution with ZFS. We have
> two nodes connected to _one_ SAS Storage (so, no DRBD or anything
> possible) and we want, to export zfs volumes via ISCSI to other systems.
>
> If the primary node fails, the ZFS Pool (which also does 8 times raidz2)
> has to be move to the secondary node. If that is done, the global IP
> (carp?) switch to the new node.
>
> It works with HA-Storage from Solaris 10, but the license are too
> expensive on none sun-hardware :-/ for our university.
>
> Any solutions?

FreeBSD + ZFS + HAST + CARP + devd will do what you want.

You create a separate hast device for each physical harddrive in the
system.  That "mirrors" the drives between the two servers.

Then you create the ZFS pool on top of the hast devices (use
/dev/hast/* instead of /dev/da*).

Then you configure CARP to provide the shared virtual IP between the
two systems.  Configure your iSCSI setup to use this IP.

Then you write some scripts to handle the orderly tear down of the ZFS
pool on one system, and to handle the orderly importing of the pool on
the other system.  And you hook those scripts into devd, so that when
CARP advertises that it is switching which system is master, then ZFS
and iSCSI switches with it.
(Continue reading)

Denny Schierz | 19 Feb 2011 00:44

Re: Build failover ZFS, like HA-Storage from Solaris

hi,

Am 18.02.2011 um 21:29 schrieb Freddie Cash:

> You create a separate hast device for each physical harddrive in the
> system.  That "mirrors" the drives between the two servers.

why should I mirror the drives, while both systems are connected to one sas storage? Both hosts can see all
drives at the same time, via the SAS HBA.

cu denny_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"

Freddie Cash | 19 Feb 2011 02:39
Picon
Gravatar

Re: Build failover ZFS, like HA-Storage from Solaris

On Fri, Feb 18, 2011 at 3:44 PM, Denny Schierz <linuxmail <at> 4lin.net> wrote:
> Am 18.02.2011 um 21:29 schrieb Freddie Cash:
>
>> You create a separate hast device for each physical harddrive in the
>> system.  That "mirrors" the drives between the two servers.
>
> why should I mirror the drives, while both systems are connected to one sas storage? Both hosts can see all
drives at the same time, via the SAS HBA.

Sorry, re-reading your original post I see that I mixed up the layers.  :)

To make sure I understand completely, you have:

[ SAN box ] ---- iSCSI ---- [ node 1 using ZFS ]
                            \-------- [ node 2 using ZFS ]

Correct?  And you want to fail-over services from node 1 to node 2?

You don't need HAST in that situation, as the SAN handles making the
storage available to both.  My bad.

But you can still use CARP and devd.  CARP provides the shared IP so
that other systems won't notice the switch over.  And devd provides
the hooks into your custom scripts so that when CARP switches from
node 1 to node 2, you export the pool on node 1, and import the pool
on node 2.

--

-- 
Freddie Cash
fjwcash <at> gmail.com
(Continue reading)

Denny Schierz | 20 Feb 2011 11:59

Re: Build failover ZFS, like HA-Storage from Solaris

hi,

Am 19.02.2011 um 02:39 schrieb Freddie Cash:

> And devd provides
> the hooks into your custom scripts so that when CARP switches from
> node 1 to node 2, you export the pool on node 1, and import the pool
> on node 2.

but how will I take care, that I don't get a split brain? Or do I think the right way, if I say "Only where the carp
IP is active, that node has the force to import ZFS?" But what happens, if through a power cut both nodes are
power on the same time? I miss something like a quorum device or service.

cu denny_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"

Josh Paetzel | 20 Feb 2011 14:49
Favicon

Re: Build failover ZFS, like HA-Storage from Solaris

On Feb 20, 2011, at 4:59 AM, Denny Schierz <linuxmail <at> 4lin.net> wrote:

> hi,
> 
> Am 19.02.2011 um 02:39 schrieb Freddie Cash:
> 
>> And devd provides
>> the hooks into your custom scripts so that when CARP switches from
>> node 1 to node 2, you export the pool on node 1, and import the pool
>> on node 2.
> 
> but how will I take care, that I don't get a split brain? Or do I think the right way, if I say "Only where the
carp IP is active, that node has the force to import ZFS?" But what happens, if through a power cut both nodes
are power on the same time? I miss something like a quorum device or 

At boot carp devices have a delay that you manually set. If both machines are powered on at the same time that
mechanism prevents both heads asserting carp MASTER. Of course it's imperfect and a staggered power on
can defeat the delay. In practice, it's pretty rare. Now what can make carp lose it's mind is that it uses the
interface config for a checksum. If the interface config differs both sides go MASTER. At that point you
start getting 50% of your IP traffic to each host, as the MAC address in the switch flaps, and so forth.  Your
scripts probably need to down the CARP device if the ZFS import fails.  

The reality of two node HA is that split brain is an unavoidable issue. Ancient sailors knew this when they
needed precise timekeeping for navigation.  Take one clock to sea or three. If you have two clocks and they disagree...

In practice most of the things that cause split brain to happen would cause issues even if the rig didn't
split brain. 

Failover while there are active writes is far more of an issue than split brain...

(Continue reading)

Dmitriy Kirhlarov | 20 Feb 2011 14:42
Picon

Re: Build failover ZFS, like HA-Storage from Solaris

20.02.2011 13:59, Denny Schierz пишет:
> hi,
>
> Am 19.02.2011 um 02:39 schrieb Freddie Cash:
>
>> And devd provides
>> the hooks into your custom scripts so that when CARP switches from
>> node 1 to node 2, you export the pool on node 1, and import the pool
>> on node 2.
>
> but how will I take care, that I don't get a split brain? Or do I think the right way, if I say "Only where the
carp IP is active, that node has the force to import ZFS?" But what happens, if through a power cut both nodes
are power on the same time? I miss something like a quorum device or service.

take a look to net/clusterit.

You can create quorum-host with this utils.

WBR
_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"


Gmane