Don,
What
I have seen in doing some testing with faster advertisement
intervals and more/fewer
advertisements leads me to a conclusion
with may or may not be
correct. I will offer my observation though.
I
have done testing of various techniques for providing
survivable network
environments. I have looked at standards-based
solutions as well as
proprietary enhancements. I have looked for
solutions that can
detect failures and recover in less than 1 second.
When
I have tested VRRP using timer settings which adhere
to the standard, I have
seen very little problems. I can certainly
introduce a load on the
network with can cause packets to be
dropped, but for the
most part, devices are capable of transmitting
and receiving an
advertisement within the 3 second window.
When
I have tested VRRP, and other protocols, using timer
settings which permit
sub-second failovers, there have been
instances where flapping
has occurred and I did not need to
introduce a load on the
network. This observation leads me to
believe that the issue
is in the implementation of the particular
survivability
technology. With some technologies, it appeared
that the protocol was
activated based upon a timer and that
the protocol would not
perform faster than a certain rate,
regardless of the
settings. In these cases, if the rate and
number of messages
missed were configured in such a way,
flapping would occur! I
have also seen cases where the
load on the
network APPEARS to slow down the handling of
the advertisements
(i.e., not high enough priority process),
and flapping has
occurred. In some cases, enabling logging
has been enough to cause
flapping to occur, when using
sub-second intervals.
Please note that I have not always been
able to test with a
particular vendor's top-of-the-line
device. That is one
reason why I think it is important to test these
technologies with
devices, and images, that are planned for
a particular computing
environment.
I
hope this has cast a little light on my interest in
sub-second
capability and a little
on what I have seen when looking at
various technologies and
implementations.
Bob
Hott
First, I just want to mention that I think Bob has
the
answer to the "which time to use" question spot on.
I
recall that the wording in the spec wasn't quite
right
last time I read it (and it
sounds like Bob agrees it
needs a little work), but
the description Bob has here
in this e-mail agrees exactly with
both my
ponderings
and my experience on what can go wrong
and what the
rules should be to make sure they don't.
But my purpose for writing this is to explore
another
question, one that comes up over and over and
I've
never felt satisfied about. Bob says,
"As
you move into lower
time intervals between Advertisements, missing 3
Advertisements is very
likely and flapping
occurs."
Now I *don't* have
experience in actual networks
where
these lower tolerances are
required, so I ask this
without having an opinion,
but is it in fact true that
that per packet failure rate
increases in cases where
a shorter failover time is
desired? What is
the
characteristic of these
environments that makes
the standard of 3 packets
less reliable that in an
environment where a 3 second
failover is acceptable?
It seems
counter intuitive to me -- when you
retransmit
faster, it causes a problem,
and the solution to that
problem is to retransmit
*even faster*? -- but, as I
say, it's probably
just something about the
target
application that I don't
yet
understand.
Or is this just because
people have always wanted to
be able to configure the
number of retransmissions, so
now's the chance? I admit to
being a little concerned
with depending on only 3
retransmissions regardless of
the rate, but I've never
actually been able to justify
any reason why three packets
wouldn't be enough.
-don