Andrew Beekhof | 2 May 13:15 2011
Picon

Re: [patch 1/2] Prune glib from stonith api, clean paramter lists.

ACK.

The following introduces a memory leak, but that can be the subject of
a followup patch
+                    for( ; devices; devices = devices->next ) {
+		        fprintf( stdout, " %s\n", devices->value );
 		    }

Thanks!

On Fri, Apr 29, 2011 at 6:46 PM, Marcus Barrow <mbarrow@...> wrote:
>
> Version 2 attached. This version includes the suggested commit message
> and removes any white space changes, which should keep the patch more
> on topic and shorter for review.
>
>
>
> ----- Original Message -----
> From: "Andrew Beekhof" <andrew@...>
> To: "The Pacemaker cluster resource manager" <pacemaker@...>
> Sent: Friday, April 29, 2011 3:15:08 AM
> Subject: Re: [Pacemaker] [patch 1/2] Prune glib from stonith api, clean paramter lists.
>
> There are some spurious whitespace changes that need to be removed and
> "more glib prune from API" isn't really acceptable as the commit
> message.
>
> <severity>: <system>: something descriptive
>
(Continue reading)

Andrew Beekhof | 2 May 13:17 2011
Picon

Re: [patch 2/2] Prune glib from stonith api, don't require glib mainloop

ack

On Fri, Apr 29, 2011 at 6:48 PM, Marcus Barrow <mbarrow@...> wrote:
>
> Version 2. I updated to apply cleanly on top of the previous patch and provided the following commit message:
>
> Medium: stonith: Allow clients to avoid use of Glib mainloop.
>
> Allow an application to avoid use of the Glib mainloop by polling
> on the asyncronous file descriptor provided in the connect call.
> Activity is then handled by using the newly provided dispatch api
> routine, whcih calls stonith_dispatch().
>
>
>
>
> ----- Original Message -----
> From: "Andrew Beekhof" <andrew@...>
> To: "The Pacemaker cluster resource manager" <pacemaker@...>
> Sent: Friday, April 29, 2011 3:21:24 AM
> Subject: Re: [Pacemaker] [patch 2/2] Prune glib from stonith api, don't require glib mainloop
>
> Again, "glib prune mainloop stuff" just isn't good enough as a commit message.
> You've gone to the effort to describe the reason for the patch below,
> thats the sort of stuff that should go in the commit message.
>
> The patch itself is good though.
>
> On Thu, Apr 28, 2011 at 5:49 PM, Marcus Barrow <mbarrow@...> wrote:
>>
(Continue reading)

Andrew Beekhof | 2 May 13:22 2011
Picon

Re: resource order question

On Fri, Apr 29, 2011 at 1:24 PM, Andreas Kurz <andreas.kurz@...> wrote:
> On 2011-04-29 09:23, Andrew Beekhof wrote:
>> On Fri, Apr 29, 2011 at 9:16 AM,  <u.schmeling@...> wrote:
>>>
>>> Hi all,
>>>
>>> Using the MailTo resource I want to generate a more specific message, like this: Having 2 resources
running on a node like MailTo and Dummy (besides some other resources) . Adding a order rule like "order
Dummy-before-Notify inf: Dummy MailTo". So if the Dummy resource is started it will write a specific
message to a file and MailTo will pickup the message. Now my question: Having this order, on a stop event
which resource will be stopped first?
>>
>> Notify I'm afraid.
>
> try unsymmetric order constraints:

Not recommended for people only just getting their feet wet though :)

>
> order start-Dummy-before-Notify inf: Dummy:start MailTo:start
> symmetrical=false
>
> order stop-Dummy-before-Notify inf: Dummy:stop MailTo:stop symmetrical=false
>
> Regards,
> Andreas
>
>>
>>> If also Dummy would be stopped first, it would be possible to place a specific down messag into the stop
procedure of Dummy and picking it up again with MailTo
(Continue reading)

Andrew Beekhof | 2 May 13:23 2011
Picon

Re: [PATCH] Low: minor corrections in the spec file

On Sat, Apr 30, 2011 at 2:51 PM, Vadym Chepkov <vchepkov@...> wrote:
> # HG changeset patch
> # User Vadym Chepkov <vchepkov@...>
> # Date 1304167609 14400
> # Branch stable-1.0
> # Node ID a051be4bc03ea0daaf9a9beaf51298c52cc3f3b7
> # Parent  1554a83db0d3c3e546cfd3aaff6af1184f79ee87
> Low: minor corrections in the spec file
>
> diff --git a/pacemaker.spec.in b/pacemaker.spec.in
> --- a/pacemaker.spec.in
> +++ b/pacemaker.spec.in
>  <at>  <at>  -86,7 +86,7  <at>  <at> 
>  %endif
>
>  %if %{with heartbeat}
> -BuildRequires: heartbeat-devel heartbeat-libs
> +BuildRequires: heartbeat-devel
>  %endif

Strictly speaking, heartbeat-libs is required.  It just gets pulled in
automatically

>
>  %description
>  <at>  <at>  -165,7 +165,7  <at>  <at> 
>  rm -rf %{buildroot}
>  make DESTDIR=%{buildroot} docdir=%{pcmk_docdir} install
>
> -# Scripts that need should be executable
(Continue reading)

Andrew Beekhof | 2 May 14:07 2011
Picon

Re: Ordering set of resources, problem in ordering chain of resources

On Wed, Apr 20, 2011 at 9:09 AM, Rakesh K <rakirocker4236@...> wrote:
> Andrew Beekhof <andrew <at> ...> writes:
>
> Hi Andrew
>
> thanks for giving replies
>  sorry for troubling you  frequently

no problem

>
> here is the out put of crm configure show xml

Doh. For some reason I thought show xml included the status.
Can you try "cibadmin -Ql" instead please?

> <?xml version="1.0" ?>
> <cib admin_epoch="0" crm_feature_set="3.0.1"
> dc-uuid="87b8b88e-3ded-4e34-8708-46f7afe62935" epoch="1120" have-quorum="1"
> num_updates="35" validate-with="pacemaker-1.0">
>  <configuration>
>    <crm_config>
>      <cluster_property_set id="cib-bootstrap-options">
>        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.0.9-89bd754939df5150de7cd76835f98fe90851b677"/>
>        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="Heartbeat"/>
>        <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="false"/>
>        <nvpair id="cib-bootstrap-options-no-quorum-policy"
(Continue reading)

Dejan Muhamedagic | 2 May 15:37 2011

Re: [patch 2/2] Prune glib from stonith api, don't require glib mainloop

Hi,

On Fri, Apr 29, 2011 at 12:48:09PM -0400, Marcus Barrow wrote:
> 
> Version 2. I updated to apply cleanly on top of the previous patch and provided the following commit message:
> 
> Medium: stonith: Allow clients to avoid use of Glib mainloop.
> 
> Allow an application to avoid use of the Glib mainloop by polling
> on the asyncronous file descriptor provided in the connect call.
> Activity is then handled by using the newly provided dispatch api
> routine, whcih calls stonith_dispatch().

What is the merit of using another API instead of the mainloop?

Cheers,

Dejan

> 
> 
> 
> ----- Original Message -----
> From: "Andrew Beekhof" <andrew@...>
> To: "The Pacemaker cluster resource manager" <pacemaker@...>
> Sent: Friday, April 29, 2011 3:21:24 AM
> Subject: Re: [Pacemaker] [patch 2/2] Prune glib from stonith api, don't require glib mainloop
> 
> Again, "glib prune mainloop stuff" just isn't good enough as a commit message.
> You've gone to the effort to describe the reason for the patch below,
(Continue reading)

Andrew Beekhof | 2 May 15:41 2011
Picon

Re: [patch 2/2] Prune glib from stonith api, don't require glib mainloop

On Mon, May 2, 2011 at 3:37 PM, Dejan Muhamedagic <dejanmm@...> wrote:
> Hi,
>
> On Fri, Apr 29, 2011 at 12:48:09PM -0400, Marcus Barrow wrote:
>>
>> Version 2. I updated to apply cleanly on top of the previous patch and provided the following commit message:
>>
>> Medium: stonith: Allow clients to avoid use of Glib mainloop.
>>
>> Allow an application to avoid use of the Glib mainloop by polling
>> on the asyncronous file descriptor provided in the connect call.
>> Activity is then handled by using the newly provided dispatch api
>> routine, whcih calls stonith_dispatch().
>
> What is the merit of using another API instead of the mainloop?

Not another API, you'd just use poll().

FWIW, the intent here is to allow mainloop use without forcing it on others.
Pacemaker will continue to use mainloop, we're just simplifying things
for those that don't.

_______________________________________________
Pacemaker mailing list: Pacemaker@...
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

(Continue reading)

tariq fillah | 2 May 18:44 2011
Picon

Problem installing gui

Hello the list,

I am trying to install pacemaker gui from the source, when I try to execute bootstrap I have some errors, and the problem is that I don't have any documentation . So does any one have an installation guide, or does anyone know the steps to follow.

Thanks in advance
Tariq
_______________________________________________
Pacemaker mailing list: Pacemaker@...
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Lars Marowsky-Bree | 2 May 21:56 2011
Picon

Re: Multi-site support in pacemaker (tokens, deadman, CTR)

On 2011-04-29T10:36:54, Andrew Beekhof <andrew@...> wrote:

> > As I understood it we had essentially reached consensus in Boston that
> > CIB replication would best be achieved by a pair of complementary
> > resource agents. I don't think we had a name then, but I'll call them
> > Publisher and Subscriber for the purposes of this discussion.
> >
> > The idea would be that Publisher exposes the <configuration/> section of
> > the CIB via a network daemon, preferably one that uses encryption.
> > Suppose this is something like lighttpd with SSL/TLS support.
> 
> I can also offer a Matahari (QMF) agent.
> The new Luci is going to be using it to get the config off remote
> machines anyway.

A pull model works for me.

> > This would be a simple primitive running exactly once in the
> > Pacemaker cluster, and only if that cluster holds the "ticket".

Yeah, so logically it would make sense to collocate it with - or
incorporate it into - the Cluster Ticket Registry, since the same is
true for that.

> > Subscriber would be the only resource (besides STONITH resources and
> > Slaves of master/slave sets) that can be active in a cluster that does
> > not hold the "ticket".
> or:
>    colocation $ticket -inf

Well, the CTR needs to be active once per cluster anyway, and it knows
which site is the current "master" for a given ticket (or if it isn't
itself).

Maybe this component of the CTR can just switch state along with that
too?

Regards,
    Lars

--

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Pacemaker mailing list: Pacemaker@...
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Lars Marowsky-Bree | 2 May 22:26 2011
Picon

Re: Multi-site support in pacemaker (tokens, deadman, CTR)

On 2011-04-29T10:32:25, Andrew Beekhof <andrew@...> wrote:

> With such a long email, assume agreement for anything I don't
> explicitly complain about :-)

Sorry :-) I'm actually trying to write this up into a somewhat more
consistent document just now, which turns out to be surprisingly hard
... Not that easily structured. I assume anything is better than nothing
though.

> > It's an excellent question where the configuration of the Cluster Token
> > Registry would reside; I'd assume that there would be a
> > resource/primitive/clone (design not finished) that corresponds to the
> > daemon instance,
> A resource or another daemon like crmd/cib/etc?
> Could go either way I guess.

Part of my goal is to have this as an add-on on top of Pacemaker.
Ideally, short of the few PE/CIB enhancements, I'd love it if Pacemaker
wouldn't even have to know about this.

The tickets clearly can only be acquired if the rest of the cluster is
up already, so having this as a clone makes some sense, and provides
some monitoring of the service itself. (Similar to how ocfs2_controld is
managed.)

> > (I think the word works - you can own a ticket, grant a ticket, cancel,
> > and revoke tickets ...)
> Maybe.
> I think token is still valid though.  Not like only one project in the
> world uses heartbeats either.
> (Naming a project after a generic term is another matter).

I have had multiple people confused at the "token" word in the CTR and
corosync contexts already. I just wanted to suggest to kill that as
early as possible if we can ;-)

> > Site-internal partitioning is handled at exactly that level; only the
> > winning/quorate partition will be running the CTR daemon and
> > re-establish communication with the other CTR instances. It will fence
> > the losers.
> 
> Ah, so thats why you suggested it be a resource.

Yes.

> Question though... what about no-quorum-policy=ignore ?

That was implicit somewhere later on, I think. The CTR must be able to
cope with multiple partitions of the same site, and would only grant the
T to one of them.

> > Probably it makes sense to add a layer of protection here to the CTR,
> > though - if several partitions from the same site connect (which could,
> > conceivably, happen), the CTRs will grant the ticket(s) only to the
> > partition with the highest node count (or, should these be equal,
> > lowest nodeid),
> How about longest uptime instead?  Possibly too variable?

That would work too, this was just to illustrate that there needs to be
a unique tie-breaker of last resort that is guaranteed to break said
tie.

> >> Additionally, when a split-brain happens, how about the existing
> >> stonith mechanism. Should the partition without quorum be stonithed?
> > Yes, just as before.
> Wouldn't that depend on whether a deadman constraint existed for one
> of the lost tickets?

Well, like I said: just as before. We don't have to STONITH anything if
we know that the nodes are clean. But, by the way, we still do, since we
don't trust nodes which failed. So unless we change the algorithm, the
partitions would get shot already, and nothing wrong with that ... Or
differently put: CTR doesn't require any change of behaviour here.

> Isn't kind=deadman for ordering constraints redundant though?

It's not required for this approach, as far as I can see, since this
only needs it for the T dependencies. I don't really care what else it
gets added to ;-)

> > Andrew, Yan - do you think we should allow _values_ for tickets, or
> > should they be strictly defined/undefined/set/unset?
> Unclear.  It might be nice to store the expiration (and/or last grant)
> time in there for admin tools to do something with.
> But that could mean a lot of spurious CIB updates, so maybe its better
> to build that into the ticket daemon's api.

I think sometime later in the discussion I actually made a case for
certain values.

> > The ticket not being set/defined should be identical to the ticket being
> > set to "false/no", as far as I can see - in either case, the ticket is
> > not owned, so all resources associated with it _must_ be stopped, and
> > may not be started again.
> There is a startup issue though.
> You don't want to go fencing yourself before you can start the daemon
> and attempt to get the token.
> 
> But the fencing logic would presumably only happen if you DONT have
> the ticket but DO have an affected resource active.

Right. If you don't own anything that depends on the ticket that you
haven't got, nothing happens.

So no start-up issue - unless someone has misconfigured ticket-protected
resources to be started outside the scope of Pacemaker, but that's
deserved then ;-)

> > Good question. This came up above already briefly ...
> >
> > I _think_ there should be a special value that a ticket can be set to
> > that doesn't fence, but stops everything cleanly.
> 
> Again, wouldn't fencing only happen if a deadman dep made use of the ticket?

Right, all of the above assumed that one actually had resources that
depend on the ticket active. Otherwise, one wouldn't know which nodes to
fence for this anyway.

> Otherwise we probably want:
>    <token id=... loss-policy=(fence|stop|freeze) granted=(true|false) />
> 
> with the daemon only updating the "granted" field.

Yeah. What I wanted to hint at above though was an
owned-policy=(start|stop) to allow admins to cleanly stop the services
even while still owning the ticket - and still be able to recover from a
revocation properly (i.e., still fencing active resources).

> > (Tangent - ownership appears to belong to the status section; the value
> > seems belongs to the cib->ticket section(?).)
> Plausible - since you'd not want nodes to come up and think they have tickets.
> That would also negate my concern about including the expiration time
> in the ticket.

Right. One thing that ties into this here is the "how do tickets expire
if the CTR dies on us", since then noone is around to revoke it from the
CIB.

I thought about handling this in the LRM, CIB, or PE (via the recheck
interval), but they all suck. The cleanest and most reliable way seems
to be to make death-of-ctr fatal for the nodes - just like
ocfs2_controld or sbd via the watchdog.

But storing the acquisition time in the CIB probably is quite useful for
the tools. I assume that typically we'll have <5 tickets around; an
additional time stamp won't hurt us.

Regards,
    Lars

--

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Pacemaker mailing list: Pacemaker@...
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Gmane