Paul Anderson | 1 Nov 2006 20:42
Picon
Picon
Favicon

Testing configurations

>
>   Luke> what I am currently looking for is a way to keep the configs
>   Luke> in cvs or similar, then automatically deply first to test,
>   Luke> then to production.
>
> We actually have a paper in at LISA this year about using this sort of
> technique with bcfg2. There are a bunch of corner cases that makes
> this a little tricky, but there are a ton of nifty things that you can
> do (including configuration transactions and workflows) if you do it
> right.

We have been doing this for a while. Recently, it has been formalised  
more & we now have several different flavours of machines with  
different levels of "stability". Production machines get a weekly  
update which has been in test on "development" machines for the  
previous week.

(Copied to Chris who has been managing all this stuff & might not be  
on the list)

    Paul
Narayan Desai | 1 Nov 2006 20:55
Favicon

Re: Testing configurations

>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:

  >> 
  Luke> what I am currently looking for is a way to keep the configs
  Luke> in cvs or similar, then automatically deply first to test,
  Luke> then to production.
  >> 
  >> We actually have a paper in at LISA this year about using this
  >> sort of technique with bcfg2. There are a bunch of corner cases
  >> that makes this a little tricky, but there are a ton of nifty
  >> things that you can do (including configuration transactions and
  >> workflows) if you do it right.

  Paul> We have been doing this for a while. Recently, it has been
  Paul> formalised more & we now have several different flavours of
  Paul> machines with different levels of "stability". Production
  Paul> machines get a weekly update which has been in test on
  Paul> "development" machines for the previous week.

Out of curiosity, does the LCFG server know anything about the
revision control system, or is the change management functionality
handled entirely externally?
 -nld
Paul Anderson | 2 Nov 2006 08:34
Picon
Picon
Favicon

Re: Testing configurations


On 1 Nov 2006, at 19:55, Narayan Desai wrote:

>>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:
>
>>>
>   Luke> what I am currently looking for is a way to keep the configs
>   Luke> in cvs or similar, then automatically deply first to test,
>   Luke> then to production.
>>>
>>> We actually have a paper in at LISA this year about using this
>>> sort of technique with bcfg2. There are a bunch of corner cases
>>> that makes this a little tricky, but there are a ton of nifty
>>> things that you can do (including configuration transactions and
>>> workflows) if you do it right.
>
>   Paul> We have been doing this for a while. Recently, it has been
>   Paul> formalised more & we now have several different flavours of
>   Paul> machines with different levels of "stability". Production
>   Paul> machines get a weekly update which has been in test on
>   Paul> "development" machines for the previous week.
>
> Out of curiosity, does the LCFG server know anything about the
> revision control system, or is the change management functionality
> handled entirely externally?

It is external. I think it is possible to write conditionals which  
depend on the release version, but that doesn't seem like a good idea.
Of course, the individual aspects (headers) are version controlled  
separately, so it is possible to pull back a particular aspect if you  
(Continue reading)

Narayan Desai | 2 Nov 2006 13:28
Favicon

Re: Testing configurations

>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:

  >> Out of curiosity, does the LCFG server know anything about the
  >> revision control system, or is the change management
  >> functionality handled entirely externally?

  Paul> It is external. I think it is possible to write conditionals
  Paul> which depend on the release version, but that doesn't seem
  Paul> like a good idea.  Of course, the individual aspects (headers)
  Paul> are version controlled separately, so it is possible to pull
  Paul> back a particular aspect if you screw something up, but the
  Paul> important thing here is to create a stable, labeled
  Paul> configuration point for the whole site.

We found that if you actually built some logic into the server, you
could use revision data as a proxy for an independent time variable,
and write logic that consumes and modifies the repo based on it. (This
is the topic of our paper this year) I am pretty sure that it is
necessary for the server to actually have some notion about the
revision data...
 -nld
Paul Anderson | 2 Nov 2006 13:45
Picon
Picon
Favicon

Re: Testing configurations


On 2 Nov 2006, at 12:28, Narayan Desai wrote:

>>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:
>
>>> Out of curiosity, does the LCFG server know anything about the
>>> revision control system, or is the change management
>>> functionality handled entirely externally?
>
>   Paul> It is external. I think it is possible to write conditionals
>   Paul> which depend on the release version, but that doesn't seem
>   Paul> like a good idea.  Of course, the individual aspects (headers)
>   Paul> are version controlled separately, so it is possible to pull
>   Paul> back a particular aspect if you screw something up, but the
>   Paul> important thing here is to create a stable, labeled
>   Paul> configuration point for the whole site.
>
> We found that if you actually built some logic into the server, you
> could use revision data as a proxy for an independent time variable,
> and write logic that consumes and modifies the repo based on it. (This
> is the topic of our paper this year) I am pretty sure that it is
> necessary for the server to actually have some notion about the
> revision data...

I think I'll need to read the paper :-)

Can you give a simple example?

   Paul
(Continue reading)

Narayan Desai | 2 Nov 2006 17:04
Favicon

Re: Testing configurations

>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:

  Paul> On 2 Nov 2006, at 12:28, Narayan Desai wrote:

>>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:
  >> 
  >>>> Out of curiosity, does the LCFG server know anything about the
  >>>> revision control system, or is the change management
  >>>> functionality handled entirely externally?
  >> 
  Paul> It is external. I think it is possible to write conditionals
  Paul> which depend on the release version, but that doesn't seem
  Paul> like a good idea.  Of course, the individual aspects (headers)
  Paul> are version controlled separately, so it is possible to pull
  Paul> back a particular aspect if you screw something up, but the
  Paul> important thing here is to create a stable, labeled
  Paul> configuration point for the whole site.
  >> 
  >> We found that if you actually built some logic into the server,
  >> you could use revision data as a proxy for an independent time
  >> variable, and write logic that consumes and modifies the repo
  >> based on it. (This is the topic of our paper this year) I am
  >> pretty sure that it is necessary for the server to actually have
  >> some notion about the revision data...

  Paul> I think I'll need to read the paper :-)

It is a fun one ;)

  Paul> Can you give a simple example?
(Continue reading)

Paul Anderson | 2 Nov 2006 17:26
Picon
Picon
Favicon

Re: Testing configurations

>   Paul> Can you give a simple example?
>

> As for how this is used, consider the following case. You want to
> decommission a service, say ntp since it is pretty close to
> stateless. The three steps are to bring up the new service, make all
> clients use the new service instance, and decommission the old service
> instance. You want to build this as a transaction, so that clients
> won't begin using the service before it exists, or continue to use the
> old service after it has been turned off.
>
> Here is how we implemented this. First, you commit three revisions to
> the svn repository. In the first, you enable the new ntp server. Say
> this gets repo revision 301. Then you commit a change that points
> clients at the new server; this gets revision 302. Then you commit a
> change that disables the old server; this gets revision 303.

Oooh. We had endless discussions about doing exactly this kind of  
sequencing for the European DataGrid, but we never implemented it  
because we couldn't solve the problems of interference between  
updates with different priorities. I'd be very interested if you have  
something that works in non-trivial cases.

In practice, there are always multiple configuration changes  
happening. These are being made by different people, and they have  
different "urgency" levels. If some server has gone away and you need  
to reconfigure the clients, you can't wait - this means that you  
can't just do this kind of operation by considering the "revision" of  
the whole configuration - you have to deal with revisions on  
"aspects": For example ...
(Continue reading)

Narayan Desai | 2 Nov 2006 17:58
Favicon

Re: Testing configurations

>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:

  Paul> Can you give a simple example?
  >> 

> As for how this is used, consider the following case. You want to
  >> decommission a service, say ntp since it is pretty close to
  >> stateless. The three steps are to bring up the new service, make
  >> all clients use the new service instance, and decommission the
  >> old service instance. You want to build this as a transaction, so
  >> that clients won't begin using the service before it exists, or
  >> continue to use the old service after it has been turned off.
  >> 
  >> Here is how we implemented this. First, you commit three
  >> revisions to the svn repository. In the first, you enable the new
  >> ntp server. Say this gets repo revision 301. Then you commit a
  >> change that points clients at the new server; this gets revision
  >> 302. Then you commit a change that disables the old server; this
  >> gets revision 303.

  Paul> Oooh. We had endless discussions about doing exactly this kind
  Paul> of sequencing for the European DataGrid, but we never
  Paul> implemented it because we couldn't solve the problems of
  Paul> interference between updates with different priorities. I'd be
  Paul> very interested if you have something that works in
  Paul> non-trivial cases.

Right, this is a fundamental limitation of the approach. While we
haven't come up with a good way to compose workflows, we have
high-enough granularity information to figure out how to nudge the
(Continue reading)

Paul Anderson | 2 Nov 2006 18:25
Picon
Picon
Favicon

Re: Testing configurations

> I guess that one of the philosophical points inherent in this
> approach is that we are willing to accept partial automation for
> complex problems. In some sense, this makes previously hard tasks
> merely inconvenient; so we see it as a net win, even it if is
> administrator time intensive in some cases...If nothing else, if tells
> you where the problems will occur when you go through transitions
> unsafely.

Yuck. Sorry :-)

In practice, it means:

Any sequenced change of significant size is likely to remain pending  
for some time. (if you have 100 machines, there is a good chance that  
there will always be one or more machines unavailable).

During this period, *any* other change that you want to make to the  
configuration of any machines involves the person manually sorting  
out the combination of his/her change with something complicated  
created by someone else, which is halfway through an automatic process?

I can't believe that this isn't a problem - are you using this in anger?

    Paul
Narayan Desai | 2 Nov 2006 20:09
Favicon

Re: Testing configurations

>>>>> "Paul" == Paul Anderson <dcspaul <at> inf.ed.ac.uk> writes:

  >> I guess that one of the philosophical points inherent in this
  >> approach is that we are willing to accept partial automation for
  >> complex problems. In some sense, this makes previously hard tasks
  >> merely inconvenient; so we see it as a net win, even it if is
  >> administrator time intensive in some cases...If nothing else, if
  >> tells you where the problems will occur when you go through
  >> transitions unsafely.

  Paul> Yuck. Sorry :-)

  Paul> In practice, it means:

  Paul> Any sequenced change of significant size is likely to remain
  Paul> pending for some time. (if you have 100 machines, there is a
  Paul> good chance that there will always be one or more machines
  Paul> unavailable).

Yeah, with two mitigating factors. One, we have a good handle on which
clients have outstanding changes wrt the transaction, and we have the
ability to force the process through. We also have activity data so
that we can spot offline or clients that have gone pretty
easily. Offline clients will almost always come up clean enough to be
ignored in these workflows, so manually forcing things through is not
universally problematic. We also have really fine-grained statistics
information, so it is possible to limit the scope of state machine
triggers. In the example workflow I gave, the first and last steps are
single point changes, and the middle one applies to all
systems. Updating the ntp config is a known quantity, so we know that
(Continue reading)


Gmane