Picon

RE: Questions on draft-ietf-vrrp-ipv4-timers-02.txt

Steve,
    I agree that the draft needs another round with the appropriate wording added to indicate how the timer should be calculated (either to use the Master's settings and configured values/granularity).
 
    I am on the fence with regard to passing the advertisement count. I think that I am leaning towards wanting to pass the failover time, as opposed to an advertisement interval. Then locally define how many advertisements (keep-alives, if you will) to send during that failover time period. The the information that is passed would be the granularity of the failover time and the failover time.
 
    I would also recommend that the IPv4 and IPv6 versions mirror each other as much as possible.
 
    Thanks for your input. I would be interested in your thoughts (as well as the thoughts of the mailing list) on using a failover time vs. the advertisement interval.
 
Bob Hott
 

Robert (Bob) W. Hott
NSWC-DD
Code B35, Bldg. 1500A/122A
17320 Dahlgren Road
Dahlgren, VA 22448-5100
540-653-1497 (W)
540-653-8673 (FAX)
robert.hott <at> navy.mil (E-mail)

-----Original Message-----
From: Steve Bates [mailto:Steve.Bates <at> alcatel.com]
Sent: Wednesday, March 29, 2006 12:10
To: Hott, Robert W CIV B35-Branch; vrrp <at> ietf.org
Cc: Odonoghue, Karen F CIV B35-Branch
Subject: RE: [VRRP] Questions on draft-ietf-vrrp-ipv4-timers-02.txt

Bob,
 
In regard to the responses for both 1 & 2, that's what I'm driving at.  We've changed the rules to allow a mismatched packet to reach the state machine but haven't addressed in the draft how the state machine should react.
 
For 3, another thought, if aig were some logarithmic function:  1/(10 ** x) where x is the value of the field, we could support values for as long as there are bits without modifying a RFC. On the other hand, fixed values do conserve bits.  I have no strong feelings one way or the other.
 
For 4, the reason I say we don't gain much is because the value being passed is the Master's value.  Since we've already shown in 2 that this value can be meaningless to a less capable backup virtual router the question is how much autonomy should a backup have?  One advantage to a configured advertisement count is that a backup on an unreliable link can adjust its value independently to avoid flapping.
 
And finally, for 5,  not passing the advertisement count would also free up space.  IPv4 and IPv6 should be as silmilar as possible.
 
Steve
-----Original Message-----
From: Hott, Robert W CIV B35-Branch [mailto:robert.hott <at> navy.mil]
Sent: Monday, March 27, 2006 9:50 PM
To: Steve Bates; vrrp <at> ietf.org
Cc: Odonoghue, Karen F CIV B35-Branch
Subject: RE: [VRRP] Questions on draft-ietf-vrrp-ipv4-timers-02.txt

Steve,

  Sorry I missed your comments the first go around. I did get them
and missed responding. Thank you for hitting me with them again. You
have some good questions. I think the draft needs to have wording
added to clarify what needs to happen. I also think some discussion
on the reflector is in order with regard to timers, granularity, and
counts. Thank you for the comments. See my response and observations
below. Hopefully this will get some discussion going!!

Bob Hott

-----Original Message-----
From: Steve Bates [mailto:Steve.Bates <at> alcatel.com]
Sent: Wednesday, March 22, 2006 18:04
To: Hott, Robert W CIV B35-Branch; vrrp <at> ietf.org
Subject: [VRRP] Questions on draft-ietf-vrrp-ipv4-timers-02.txt


Hi Bob,

I've posted these comments before but I'll rephrase them based on the 02
draft.

1) Back in early December there was a flurry of comments about mismatched
advertisement intervals causing multiple masters.  The consensus seemed to
be that virtual routers of a lower priority would ignore a mismatch if the
higher priority interval was less than their configured interval and make
the higher priority interval their own operational value when it was greater
than their configured interval.  I was under the impression, based on
Radia's response to your original December 2nd post, that this would be
reflected in the draft.  Other than the omission of the phrase "the receiver
MUST discard the packet" in the final paragraph of section 4.1 I don't see
anything about this. 

Steve, removing the phrase from Section 4.1 about discarding the packet
was needed so that mismatches would get handled in the STATE MACHINE.
If the Advertisements were discarded, multiple Masters could occur.
Now, with regard to which timer to use, I think that the consensus was
to use the MASTER values when there was a mis-match. Clock granularity could
impact what is actually used, see my response to your question #2, below.
When I look at your questions and the draft, I think the draft needs
to better identify and discuss the values used; whether it is the
configured value, the value received from the MASTER, etc...


2) This may become a greater issue with the addition of advertising interval
granularity.  It's conceivable that one implementation might not be able to
support as fine(?) a granularity as another, while both are type 2 virtual
routers.  For example, suppose a higher priority virtual router A is
configured with adver_cnt = 3, aig = 1, and adver_int = 1, while a lower
priority virtual router B is configured with adver_cnt = 3, aig = 2, and
adver_int = 1.  Virtual router B MUST accept virtual router A's values to
avoid flapping - unless it discards the FAST ADVERTISEMENT and creates a two
master situation.  Conversely, if the priorities are reversed B's values are
useless to A if A is incapable of supporting the finer granularity but at
least A will maintain it's backup state.  Am I mistaken?


Okay, so here is how it should work, not saying that the wording is in
place:

If Router A is Master (or any router with a granularity less than its
Backups), then the Master will propagate its values to the Backups.
The Backups, in this case Router B, should be able to set its
timer (Master_Down_Timer) based on the received adver_cnt and adver_int
in the advertised clock granularity (centiseconds) units, since its
own clock granularity was milliseconds. This should work fine.

Now for the opposite situation, where the Master has a clock
granularity greater that its Backups. So, Router B is now the Master
and Router A is the Backup. Here, Router B sends the advertisement
once every millisecond. Since Router A cannot set its timer to 1
millisecond, it should set its timer based upon its own lowest setting.
In this case, 1 centisecond should be used. Router A won't failover to
become Master as fast as Router B would like (3 microseconds) but it
will eventually become Master, as soon as it can. Once Router B
becomes the Master, if it does, it would use its configured settings.

I think the draft needs to better describe which settings should be
used, based upon clock granularity. Thus, the Master_Down_Interval
needs to reflect the Master settings and the internal settings.  

3) The two bit granularity field seems a bit shortsighted.  A) There are
some RTOSes where a "tick" is defined as 1/60th of a second.  This makes
achieving centisecond granularity difficult.  A decisecond value would be
nice.  B) But if we do that we've used up all the values in the field.  Ten
years from now we'll be on terabit networks and someone may want microsecond
granularity.  Another bit might be a good idea.

I tend to agree with you about the need for more bits for granularity.
This kind of relates to your last question, below. I figured that
the last bit would be used for microseconds, but wondered about the
decisecond need. I thought that the 10 bits for centiseconds would
cover the decisecond requirement.

4) I don't dispute the desirability of  a configurable advertisement count
but I'm not sure we gain much passing it in the advertisement.

The issue that I have with the current Master_Down_Interval is that it
is based upon a fixed Advertisement Count of 3. As you move into lower
time intervals between Advertisements, missing 3 Advertisements is very
likely and flapping occurs. I think that missing 3 Advertisements was
reasonable when the failovers were in the order of seconds. I guess
one option would be to change to way that the advertisement_interval
is calculated and only pass the acceptable outage period ( failover_time).
What do you / everyone think about that? So, in the Advertisement, you
would pass the desired failover time. On the Master, it is configured to
send Advertisements every
(Failover_Time / Configured_number_per_interval). The Backups would
only care about Failover_Time and clock granularity.


5) For the VRRPv3 packet there are only 4 bits available since the
advertisement interval uses 12.  Do you have an idea how to migrate this
draft to the IPv6 case?

I think the right way to handle this for IPv6 is to do the same thing
for both V4 and V6. If the discussion on your 4th question moves the
standard to specify a failover time, then I think there will be
enough room to specify a clock granularity. My original proposal
was not to introduce a new version of VRRP, but add a new
message type. If IPv4 and IPv6 were aligned, maybe it would be
better to use a single type advertisement and introduce a new
version. What do folks think?


Steve Bates

Bob Hott

_______________________________________________
vrrp mailing list
vrrp <at> ietf.org
https://www1.ietf.org/mailman/listinfo/vrrp
Picon

RE: Questions on draft-ietf-vrrp-ipv4-timers-02.txt

Don,
    I appreciate your thoughts. See my comments below.
 
-----Original Message-----
From: Don Provan [mailto:dprovan <at> bivio.net]
Sent: Wednesday, March 29, 2006 13:48
To: Hott, Robert W CIV B35-Branch; 'Steve Bates'; vrrp <at> ietf.org
Cc: Odonoghue, Karen F CIV B35-Branch
Subject: RE: [VRRP] Questions on draft-ietf-vrrp-ipv4-timers-02.txt

OK, please excuse me for doubting you, but I'm
afraid this response has kinda supported my fears.
The issues you're talking about -- and I note
turning on logging as a specific case -- should
have the effect of delaying the processing of each
individual packet including the one that should
have prevented the flap. That suggests that the
over all timeout is too small, *not* that too few
packets were transmitted.
 
But since you've obviously investigated this way
beyond anything I've done, perhaps you already have
the more detailed data that would convince me.
First, are you sure your tests were checking the
number of packets transmitted without changing the
length of the timeout. And, second, were you able
to confirm that flapping was caused by packets that
were lost and not by packets that were delayed.
 
I'm not arguing against being able to configure
the number of retransmissions, as long as everyone
agrees VRRP needs it. I just want to make sure we
understand (and the spec expresses) what this really
accomplishes. In my experience (which was, admittedly,
limited to my implementation), I found that limits
to the timeout were *always* caused by packet latency,
never, ever by packet loss. I have seen problems
caused by packet loss, but with normal intervals
as much as with small ones.
I have done some testing and I think I have seen situations where
the ability to have additional advertisements has prevented
flapping. I did not perform enough analysis to determine if having
more advertisements during an overall time period managed to get an
advertisement through a queue where fewer during the same
overall time period did not, OR if some of the advertisements
were dropped due to queue overload. I suspect that the issue
was latency related, thus getting an advertisement in the
queue sooner might help but would not guarantee that flapping
would not occur.
 
For your info, I ran several tests where I had a desire to keep
the failover time under .6 seconds (this is an example as not to
point fingers at a specific implementation). To do that,
with the standard implementation, an advertisement interval
of .2 seconds was used. Flapping could be made to occur
under network loading or intense activity on the Master
router (say a denial of service type of attack, logging, or
network management accesses). Using vendor specific
options for sending more advertisements, the
advertisement interval was lowered to .1 second and
a total of 6 could be missed prior to a Backup taking
over. The stability of the protocol was greatly increased.
 
Is latency the issue, as opposed to packets dropped. You
are probably correct, that it is. Either way, it is a problem.
If packets aren't lost, getting them in the queue sooner
did help. There are environments that have a specific failover
requirement. The flexibility to send more advertisements
over the same time interval appears to help the stability of
the protocol. As you stated below, Master routers can be
set to inappropriate values and flapping will occur when this
happens.
 
On another note, I think we agree that the lowest,
reliable timeout period depends on many factors
in the implementation and in the environment; it
isn't something that can be defined by the
protocol. I wonder if, at these rates, we can or
should add some rules for "flapping recovery."
When a backup inappropriate takes over the VR,
the correct master knows it. I wonder if we should
add something for a master to announce that it was
inappropriately replaced and set a higher timeout
to avoid future mistakes? Just a thought off the
top of my head....
 
Yes, we do agree! You are right, that the real Master
does know when it has been replaced by another. I
could envision a configuration option that would allow
the real Master to alter its timeout, but I would want
the ability to prevent this action (raising the
timeout) too. I certainly think that the real Master
should report the potential flapping.
 
-don 
 
Thanks for your insight! I appreciate you perspective. My view is from
the end user and your view is not always obvious to me. Thanks again.
 
Bob Hott

Robert (Bob) W. Hott
NSWC-DD
Code B35, Bldg. 1500A/122A
17320 Dahlgren Road
Dahlgren, VA 22448-5100
540-653-1497 (W)
540-653-8673 (FAX)
robert.hott <at> navy.mil (E-mail)

_______________________________________________
vrrp mailing list
vrrp <at> ietf.org
https://www1.ietf.org/mailman/listinfo/vrrp
Mukesh Gupta | 1 May 21:45 2006

FW: IETF Meeting Survey

FYI.

Please take time to complete the survey.

-----Original Message-----
From: Ray Pelletier [mailto:rpelletier <at> isoc.org] 
Sent: Monday, May 01, 2006 8:25 AM
To: wgchairs <at> ietf.org
Subject: IETF Meeting Survey

All;
It has been suggested that if I really wanted to get feedback on the 
Meeting Survey (and I do)  I should have it forwarded by the WG Chairs 
to the working groups - so this is a request that you ask members of the

working groups to complete a short survey that focuses primarily, but 
not exclusively, on their meeting experience in Dallas. 

I truly want the info to make the changes members of the community want,

if possible and greatly appreciate your assistance in this regard.

Those interested in taking the survey can find it at:
http://www.surveymonkey.com/s.asp?u=649182049947

Thanks
Ray Pelletier
IETF Admininstrative Director

_______________________________________________
vrrp mailing list
vrrp <at> ietf.org
https://www1.ietf.org/mailman/listinfo/vrrp


Gmane