Jeff Tantsura | 3 May 05:10 2016
Picon

WGLC on draft-ietf-rtgwg-rlfa-node-protection

Dear RTGWG,

 

The authors of draft-ietf-rtgwg-rlfa-node-protection have told us that the

draft is ready for working group last call (WGLC).

 

This email is to start  the Working Group Last Call (WGLC) for draft-ietf-rtgwg-rlfa-node-protection.
This call will close by Monday, May 16.
Please provide your feedback whether you support (or not) the advancement of this draft. 

Thanks,
Jeff and Chris
<!-- /* Font Definitions */ <at> font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} <at> font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} <at> font-face {font-family:-webkit-standard;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman",serif;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal; font-family:"Calibri",sans-serif; color:#1F497D;} span.EmailStyle18 {mso-style-type:personal; font-family:"Calibri",sans-serif; color:#1F497D;} span.EmailStyle19 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} <at> page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} -->
_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
Jeff Tantsura | 1 May 10:18 2016
Picon

RTGWG draft minutes available

Hi RTGWG,

The draft minutes for the RTGAREA meeting at IETF 95 are now available.  Please let me know if you have any comments.

Jeff & Chris
_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
Picon

rtgwg - New Meeting Session Request for IETF 96


A new meeting session request has just been submitted by Jeff Tantsura, a Chair of the rtgwg working group.

---------------------------------------------------------
Working Group Name: Routing Area Working Group
Area Name: Routing Area
Session Requester: Jeff Tantsura

Number of Sessions: 2
Length of Session(s):  2 Hours, 2.5 Hours
Number of Attendees: 150
Conflicts to Avoid: 
 First Priority: isis mpls ospf pce idr spring
 Second Priority: bess l3sm  bier teas ccamp i2rs
 Third Priority: bfd detnet lime nvo3 pim  netmod

Special Requests:

---------------------------------------------------------
Hannes Gredler | 26 Apr 10:14 2016
Picon
Gravatar

WGLC on draft-ietf-rtgwg-rlfa-node-protection

hi,

I am not aware of any IPR other than the one already disclosed

https://datatracker.ietf.org/ipr/2334/
https://datatracker.ietf.org/ipr/2346/

rgds,

/hannes
Acee Lindem (acee | 25 Apr 19:16 2016
Picon

Routing Directorate Review for "Use of BGP for routing in large-scale data centers" (adding RTG WG)

Hello,

I have been selected as the Routing Directorate reviewer for this draft.
The Routing Directorate seeks to review all routing or routing-related
drafts as they pass through IETF last call and IESG review, and sometimes
on special request. The purpose of the review is to provide assistance to
the Routing ADs. For more information about the Routing Directorate,
please see ​http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Although these comments are primarily for the use of the Routing ADs, it
would be helpful if you could consider them along with any other IETF Last
Call comments that you receive, and strive to resolve them through
discussion or by updating the draft.

Document: draft-ietf-rtgwg-bgp-routing-large-dc-09.txt
Reviewer: Acee Lindem
Review Date: 4/25/16
IETF LC End Date: Not started
Intended Status: Informational

Summary:
    This document is basically ready for publication, but has some minor
issues and nits that should be resolved prior to publication.

Comments:
    The document starts with the requirements for an MSDC routing and then
provides an overview of Clos data topologies and data center network
design. This overview attempts to cover a lot of a material in a very
small amount of text. While not completely successful, the overview
provides a lot of good information and references. The bulk of the
document covers the usage of EBGP as the sole data center routing protocol
and other aspects of the routing design including ECMP, summarization
issues, and convergence. These sections provide a very good guide for
using EBGP in a Clos data center and an excellent discussion of the
deployment issues (based on real deployment experience).

    The technical content of the document is excellent. The readability
could be improved by breaking up some of the run-on sentences and with the
suggested editorial changes (see Nits below).


Major Issues:

    I have no major issues with the document.

Minor Issues:

    Section 4.2: Can an informative reference be added for Direct Server
Return (DSR)?
    Section 5.2.4 and 7.4: Define precisely what is meant by "scale-out"
topology somewhere in the document.
    Section 5.2.5: Can you add a backward reference to the discussion of
"lack of peer links inside every peer”? Also, it would be good to describe
how this would allow for summarization and under what failure conditions.
    Section 7.4: Should you add a reference to
https://www.ietf.org/id/draft-ietf-rtgwg-bgp-pic-00.txt to the penultimate
paragraph in this section?

Nits:

***************
*** 143,149 ****
     network stability so that a small group of people can effectively
     support a significantly sized network.
  
!    Experimentation and extensive testing has shown that External BGP
     (EBGP) [RFC4271] is well suited as a stand-alone routing protocol for
     these type of data center applications.  This is in contrast with
     more traditional DC designs, which may se simple tree topologies and
--- 143,149 ----
     network stability so that a sall group of people can effectively
     support a significantly sized network.
  
!    Experimentation and extensive testing have shown that External BGP
     (EBGP) [RFC4271] is well suited as a stand-alone routing protocol for
     these type of data center applications.  This is in contrast with
     more traditional DC designs, which may use simple tree topologies and
***************
*** 178,191 ****
  2.1.  Bandwidth and Traffic Patterns
  
     The primary requirement when building an interconnection network for
!    large number of servers is to accommodate application bandwidth and
     latency requirements.  Until recently it was quite common to see the
     majority of traffic entering and leaving the data center, commonly
     referred to as "north-south" traffic.  Traditional "tree" topologies
     were sufficient to accommodate such flows, even with high
     oversubscription ratios between the layers of the network.  If more
     bandwidth was required, it was added by "scaling up" the network
!    elements, e.g. by upgrading the device's linecards or fabrics or
     replacing the device with one with higher port density.
  
     Today many large-scale data centers host applications generating
--- 178,191 ----
  2.1.  Bandwidth and Traffic Patterns
  
     The primary requirement when building an interconnection network for
!    a large number of servers is to accommodate application bandwidth and
     latency requirements.  Until recently it was quite common to see the
     majority of traffic entering and leaving the data center, commonly
     referred to as "north-south" traffic.  Traditional "tree" topologies
     were sufficient to accommodate such flows, even with high
     oversubscription ratios between the layers of the network.  If more
     bandwidth was required, it was added by "scaling up" the network
!    elements, e.g., by upgrading the device's linecards or fabrics or
     replacing the device with one with higher port density.
  
     Today many large-scale data centers host applications generating
***************
*** 195,201 ****
     [HADOOP], massive data replication between clusters needed by certain
     applications, or virtual machine migrations.  Scaling traditional
     tree topologies to match these bandwidth demands becomes either too
!    expensive or impossible due to physical limitations, e.g. port
     density in a switch.
  
  2.2.  CAPEX Minimization
--- 195,201 ----
     [HADOOP], massive data replication between clusters needed by certain
     applications, or virtual machine migrations.  Scaling traditional
     tree topologies to match these bandwidth demands becomes either too
!    expensive or impossible due to physical limitations, e.g., port
     density in a switch.
  
  2.2.  CAPEX Minimization
***************
*** 209,215 ****
  
     o  Unifying all network elements, preferably using the same hardware
        type or even the same device.  This allows for volume pricing on
!       bulk purchases and reduced maintenance and sparing costs.
  
     o  Driving costs down using competitive pressures, by introducing
        multiple network equipment vendors.
--- 209,215 ----
  
     o  Unifying all network elements, preferably using the same hardware
        type or even the same device.  This allows for volume pricing on
!       bulk purchases and reduced maintenance and inventory costs.
  
     o  Driving costs down using competitive pressures, by introducing
        multiple network equipment vendors.
***************
*** 234,244 ****
     minimizes software issue-related failures.
  
     An important aspect of Operational Expenditure (OPEX) minimization is
!    reducing size of failure domains in the network.  Ethernet networks
     are known to be susceptible to broadcast or unicast traffic storms
     that can have a dramatic impact on network performance and
     availability.  The use of a fully routed design significantly reduces
!    the size of the data plane failure domains - i.e. limits them to the
     lowest level in the network hierarchy.  However, such designs
     introduce the problem of distributed control plane failures.  This
     observation calls for simpler and less control plane protocols to
--- 234,244 ----
     minimizes software issue-related failures.
  
     An important aspect of Operational Expenditure (OPEX) minimization is
!    reducing the size of failure domains in the network.  Ethernet
networks
     are known to be susceptible to broadcast or unicast traffic storms
     that can have a dramatic impact on network performance and
     availability.  The use of a fully routed design significantly reduces
!    the size of the data plane failure domains, i.e., limits them to the
     lowest level in the network hierarchy.  However, such designs
     introduce the problem of distributed control plane failures.  This
     observation calls for simpler and less control plane protocols to
***************
*** 253,259 ****
     performed by network devices.  Traditionally, load balancers are
     deployed as dedicated devices in the traffic forwarding path.  The
     problem arises in scaling load balancers under growing traffic
!    demand.  A preferable solution would be able to scale load balancing
     layer horizontally, by adding more of the uniform nodes and
     distributing incoming traffic across these nodes.  In situations like
     this, an ideal choice would be to use network infrastructure itself
--- 253,259 ----
     performed by network devices.  Traditionally, load balancers are
     deployed as dedicated devices in the traffic forwarding path.  The
     problem arises in scaling load balancers under growing traffic
!    demand.  A preferable solution would be able to scale the load
balancing
     layer horizontally, by adding more of the uniform nodes and
     distributing incoming traffic across these nodes.  In situations like
     this, an ideal choice would be to use network infrastructure itself
***************
*** 305,311 ****
  3.1.  Traditional DC Topology
  
     In the networking industry, a common design choice for data centers
!    typically look like a (upside down) tree with redundant uplinks and
     three layers of hierarchy namely; core, aggregation/distribution and
     access layers (see Figure 1).  To accommodate bandwidth demands, each
     higher layer, from server towards DC egress or WAN, has higher port
--- 305,311 ----
  3.1.  Traditional DC Topology
  
     In the networking industry, a common design choice for data centers
!    typically look like an (upside down) tree with redundant uplinks and
     three layers of hierarchy namely; core, aggregation/distribution and
     access layers (see Figure 1).  To accommodate bandwidth demands, each
     higher layer, from server towards DC egress or WAN, has higher port
***************
*** 373,379 ****
     topology, sometimes called "fat-tree" (see, for example, [INTERCON]
     and [ALFARES2008]).  This topology features an odd number of stages
     (sometimes known as dimensions) and is commonly made of uniform
!    elements, e.g. network switches with the same port count.  Therefore,
     the choice of folded Clos topology satisfies REQ1 and facilitates
     REQ2.  See Figure 2 below for an example of a folded 3-stage Clos
     topology (3 stages counting Tier-2 stage twice, when tracing a packet
--- 373,379 ----
     topology, sometimes called "fat-tree" (see, for example, [INTERCON]
     and [ALFARES2008]).  This topology features an odd number of stages
     (sometimes known as dimensions) and is commonly made of uniform
!    elements, e.g., network switches with the same port count.  Therefore,
     the choice of folded Clos topology satisfies REQ1 and facilitates
     REQ2.  See Figure 2 below for an example of a folded 3-stage Clos
     topology (3 stages counting Tier-2 stage twice, when tracing a packet
***************
*** 460,466 ****
  3.2.3.  Scaling the Clos topology
  
     A Clos topology can be scaled either by increasing network element
!    port density or adding more stages, e.g. moving to a 5-stage Clos, as
     illustrated in Figure 3 below:
  
                                        Tier-1
--- 460,466 ----
  3.2.3.  Scaling the Clos topology
  
     A Clos topology can be scaled either by increasing network element
!    port density or adding more stages, e.g., moving to a 5-stage Clos, as
     illustrated in Figure 3 below:
  
                                        Tier-1
***************
*** 523,529 ****
  3.2.4.  Managing the Size of Clos Topology Tiers
  
     If a data center network size is small, it is possible to reduce the
!    number of switches in Tier-1 or Tier-2 of Clos topology by a factor
     of two.  To understand how this could be done, take Tier-1 as an
     example.  Every Tier-2 device connects to a single group of Tier-1
     devices.  If half of the ports on each of the Tier-1 devices are not
--- 523,529 ----
  3.2.4.  Managing the Size of Clos Topology Tiers
  
     If a data center network size is small, it is possible to reduce the
!    number of switches in Tier-1 or Tier-2 of a Clos topology by a factor
     of two.  To understand how this could be done, take Tier-1 as an
     example.  Every Tier-2 device connects to a single group of Tier-1
     devices.  If half of the ports on each of the Tier-1 devices are not
***************
*** 574,580 ****
     originally defined in [IEEE8021D-1990] for loop free topology
     creation, typically utilizing variants of the traditional DC topology
     described in Section 3.1.  At the time, many DC switches either did
!    not support Layer 3 routed protocols or supported it with additional
     licensing fees, which played a part in the design choice.  Although
     many enhancements have been made through the introduction of Rapid
     Spanning Tree Protocol (RSTP) in the latest revision of
--- 574,580 ----
     originally defined in [IEEE8021D-1990] for loop free topology
     creation, typically utilizing variants of the traditional DC topology
     described in Section 3.1.  At the time, many DC switches either did
!    not support Layer 3 routing protocols or supported them with
additional
     licensing fees, which played a part in the design choice.  Although
     many enhancements have been made through the introduction of Rapid
     Spanning Tree Protocol (RSTP) in the latest revision of
***************
*** 599,605 ****
     as the backup for loop prevention.  The major downsides of this
     approach are the lack of ability to scale linearly past two in most
     implementations, lack of standards based implementations, and added
!    failure domain risk of keeping state between the devices.
  
     It should be noted that building large, horizontally scalable, Layer
     2 only networks without STP is possible recently through the
--- 599,605 ----
     as the backup for loop prevention.  The major downsides of this
     approach are the lack of ability to scale linearly past two in most
     implementations, lack of standards based implementations, and added
!    the failure domain risk of syncing state between the devices.
  
     It should be noted that building large, horizontally scalable, Layer
     2 only networks without STP is possible recently through the
***************
*** 621,631 ****
     Finally, neither the base TRILL specification nor the M-LAG approach
     totally eliminate the problem of the shared broadcast domain, that is
     so detrimental to the operations of any Layer 2, Ethernet based
!    solutions.  Later TRILL extensions have been proposed to solve the
     this problem statement primarily based on the approaches outlined in
     [RFC7067], but this even further limits the number of available
!    interoperable implementations that can be used to build a fabric,
!    therefore TRILL based designs have issues meeting REQ2, REQ3, and
     REQ4.
  
  4.2.  Hybrid L2/L3 Designs
--- 621,631 ----
     Finally, neither the base TRILL specification nor the M-LAG approach
     totally eliminate the problem of the shared broadcast domain, that is
     so detrimental to the operations of any Layer 2, Ethernet based
!    solution.  Later TRILL extensions have been proposed to solve the
     this problem statement primarily based on the approaches outlined in
     [RFC7067], but this even further limits the number of available
!    interoperable implementations that can be used to build a fabric.
!    Therefore, TRILL based designs have issues meeting REQ2, REQ3, and
     REQ4.
  
  4.2.  Hybrid L2/L3 Designs
***************
*** 635,641 ****
     in either the Tier-1 or Tier-2 parts of the network and dividing the
     Layer 2 domain into numerous, smaller domains.  This design has
     allowed data centers to scale up, but at the cost of complexity in
!    the network managing multiple protocols.  For the following reasons,
     operators have retained Layer 2 in either the access (Tier-3) or both
     access and aggregation (Tier-3 and Tier-2) parts of the network:
  
--- 635,641 ----
     in either the Tier-1 or Tier-2 parts of the network and dividing the
     Layer 2 domain into numerous, smaller domains.  This design has
     allowed data centers to scale up, but at the cost of complexity in
!    the managing multiple network protocols.  For the following reasons,
     operators have retained Layer 2 in either the access (Tier-3) or both
     access and aggregation (Tier-3 and Tier-2) parts of the network:
  
***************
*** 644,650 ****
  
     o  Seamless mobility for virtual machines that require the
        preservation of IP addresses when a virtual machine moves to
!       different Tier-3 switch.
  
     o  Simplified IP addressing = less IP subnets are required for the
        data center.
--- 644,650 ----
  
     o  Seamless mobility for virtual machines that require the
        preservation of IP addresses when a virtual machine moves to
!       a different Tier-3 switch.
  
     o  Simplified IP addressing = less IP subnets are required for the
        data center.
***************
*** 679,686 ****
     adoption in networks where large Layer 2 adjacency and larger size
     Layer 3 subnets are not as critical compared to network scalability
     and stability.  Application providers and network operators continue
!    to also develop new solutions to meet some of the requirements that
!    previously have driven large Layer 2 domains by using various overlay
     or tunneling techniques.
  
  5.  Routing Protocol Selection and Design
--- 679,686 ----
     adoption in networks where large Layer 2 adjacency and larger size
     Layer 3 subnets are not as critical compared to network scalability
     and stability.  Application providers and network operators continue
!    to develop new solutions to meet some of the requirements that
!    previously had driven large Layer 2 domains using various overlay
     or tunneling techniques.
  
  5.  Routing Protocol Selection and Design
***************
*** 700,706 ****
     design.
  
     Although EBGP is the protocol used for almost all inter-domain
!    routing on the Internet and has wide support from both vendor and
     service provider communities, it is not generally deployed as the
     primary routing protocol within the data center for a number of
     reasons (some of which are interrelated):
--- 700,706 ----
     design.
  
     Although EBGP is the protocol used for almost all inter-domain
!    routing in the Internet and has wide support from both vendor and
     service provider communities, it is not generally deployed as the
     primary routing protocol within the data center for a number of
     reasons (some of which are interrelated):
***************
*** 741,754 ****
        state IGPs.  Since every BGP router calculates and propagates only
        the best-path selected, a network failure is masked as soon as the
        BGP speaker finds an alternate path, which exists when highly
!       symmetric topologies, such as Clos, are coupled with EBGP only
        design.  In contrast, the event propagation scope of a link-state
        IGP is an entire area, regardless of the failure type.  In this
        way, BGP better meets REQ3 and REQ4.  It is also worth mentioning
        that all widely deployed link-state IGPs feature periodic
!       refreshes of routing information, even if this rarely causes
!       impact to modern router control planes, while BGP does not expire
!       routing state.
  
     o  BGP supports third-party (recursively resolved) next-hops.  This
        allows for manipulating multipath to be non-ECMP based or
--- 741,754 ----
        state IGPs.  Since every BGP router calculates and propagates only
        the best-path selected, a network failure is masked as soon as the
        BGP speaker finds an alternate path, which exists when highly
!       symmetric topologies, such as Clos, are coupled with an EBGP only
        design.  In contrast, the event propagation scope of a link-state
        IGP is an entire area, regardless of the failure type.  In this
        way, BGP better meets REQ3 and REQ4.  It is also worth mentioning
        that all widely deployed link-state IGPs feature periodic
!       refreshes of routing information while BGP does not expire
!       routing state, although this rarely impacts modern router control
!       planes.
  
     o  BGP supports third-party (recursively resolved) next-hops.  This
        allows for manipulating multipath to be non-ECMP based or
***************
*** 765,775 ****
        controlled and complex unwanted paths will be ignored.  See
        Section 5.2 for an example of a working ASN allocation scheme.  In
        a link-state IGP accomplishing the same goal would require multi-
!       (instance/topology/processes) support, typically not available in
        all DC devices and quite complex to configure and troubleshoot.
        Using a traditional single flooding domain, which most DC designs
        utilize, under certain failure conditions may pick up unwanted
!       lengthy paths, e.g. traversing multiple Tier-2 devices.
  
     o  EBGP configuration that is implemented with minimal routing policy
        is easier to troubleshoot for network reachability issues.  In
--- 765,775 ----
        controlled and complex unwanted paths will be ignored.  See
        Section 5.2 for an example of a working ASN allocation scheme.  In
        a link-state IGP accomplishing the same goal would require multi-
!       (instance/topology/process) support, typically not available in
        all DC devices and quite complex to configure and troubleshoot.
        Using a traditional single flooding domain, which most DC designs
        utilize, under certain failure conditions may pick up unwanted
!       lengthy paths, e.g., traversing multiple Tier-2 devices.
  
     o  EBGP configuration that is implemented with minimal routing policy
        is easier to troubleshoot for network reachability issues.  In
***************
*** 806,812 ****
        loopback sessions are used even in the case of multiple links
        between the same pair of nodes.
  
!    o  Private Use ASNs from the range 64512-65534 are used so as to
        avoid ASN conflicts.
  
     o  A single ASN is allocated to all of the Clos topology's Tier-1
--- 806,812 ----
        loopback sessions are used even in the case of multiple links
        between the same pair of nodes.
  
!    o  Private Use ASNs from the range 64512-65534 are used to
        avoid ASN conflicts.
  
     o  A single ASN is allocated to all of the Clos topology's Tier-1
***************
*** 815,821 ****
     o  A unique ASN is allocated to each set of Tier-2 devices in the
        same cluster.
  
!    o  A unique ASN is allocated to every Tier-3 device (e.g.  ToR) in
        this topology.
  
  
--- 815,821 ----
     o  A unique ASN is allocated to each set of Tier-2 devices in the
        same cluster.
  
!    o  A unique ASN is allocated to every Tier-3 device (e.g.,  ToR) in
        this topology.
  
  
***************
*** 903,922 ****
  
     Another solution to this problem would be using Four-Octet ASNs
     ([RFC6793]), where there are additional Private Use ASNs available,
!    see [IANA.AS].  Use of Four-Octet ASNs put additional protocol
!    complexity in the BGP implementation so should be considered against
     the complexity of re-use when considering REQ3 and REQ4.  Perhaps
     more importantly, they are not yet supported by all BGP
     implementations, which may limit vendor selection of DC equipment.
!    When supported, ensure that implementations in use are able to remove
!    the Private Use ASNs if required for external connectivity
!    (Section 5.2.4).
  
  5.2.3.  Prefix Advertisement
  
     A Clos topology features a large number of point-to-point links and
     associated prefixes.  Advertising all of these routes into BGP may
!    create FIB overload conditions in the network devices.  Advertising
     these links also puts additional path computation stress on the BGP
     control plane for little benefit.  There are two possible solutions:
  
--- 903,922 ----
  
     Another solution to this problem would be using Four-Octet ASNs
     ([RFC6793]), where there are additional Private Use ASNs available,
!    see [IANA.AS].  Use of Four-Octet ASNs puts additional protocol
!    complexity in the BGP implementation and should be balanced against
     the complexity of re-use when considering REQ3 and REQ4.  Perhaps
     more importantly, they are not yet supported by all BGP
     implementations, which may limit vendor selection of DC equipment.
!    When supported, ensure that deployed implementations are able to
remove
!    the Private Use ASNs when external connectivity to these ASes is
!    required (Section 5.2.4).
  
  5.2.3.  Prefix Advertisement
  
     A Clos topology features a large number of point-to-point links and
     associated prefixes.  Advertising all of these routes into BGP may
!    create FIB overload in the network devices.  Advertising
     these links also puts additional path computation stress on the BGP
     control plane for little benefit.  There are two possible solutions:
  
***************
*** 925,951 ****
        device, distant networks will automatically be reachable via the
        advertising EBGP peer and do not require reachability to these
        prefixes.  However, this may complicate operations or monitoring:
!       e.g. using the popular "traceroute" tool will display IP addresses
        that are not reachable.
  
     o  Advertise point-to-point links, but summarize them on every
        device.  This requires an address allocation scheme such as
        allocating a consecutive block of IP addresses per Tier-1 and
        Tier-2 device to be used for point-to-point interface addressing
!       to the lower layers (Tier-2 uplinks will be numbered out of Tier-1
!       addressing and so forth).
  
     Server subnets on Tier-3 devices must be announced into BGP without
     using route summarization on Tier-2 and Tier-1 devices.  Summarizing
     subnets in a Clos topology results in route black-holing under a
!    single link failure (e.g. between Tier-2 and Tier-3 devices) and
     hence must be avoided.  The use of peer links within the same tier to
     resolve the black-holing problem by providing "bypass paths" is
     undesirable due to O(N^2) complexity of the peering mesh and waste of
     ports on the devices.  An alternative to the full-mesh of peer-links
!    would be using a simpler bypass topology, e.g. a "ring" as described
     in [FB4POST], but such a topology adds extra hops and has very
!    limited bisection bandwidth, in addition requiring special tweaks to
  
  
  
--- 925,951 ----
        device, distant networks will automatically be reachable via the
        advertising EBGP peer and do not require reachability to these
        prefixes.  However, this may complicate operations or monitoring:
!       e.g., using the popular "traceroute" tool will display IP addresses
        that are not reachable.
  
     o  Advertise point-to-point links, but summarize them on every
        device.  This requires an address allocation scheme such as
        allocating a consecutive block of IP addresses per Tier-1 and
        Tier-2 device to be used for point-to-point interface addressing
!       to the lower layers (Tier-2 uplink addresses will be allocated
!       from Tier-1 address blocks and so forth).
  
     Server subnets on Tier-3 devices must be announced into BGP without
     using route summarization on Tier-2 and Tier-1 devices.  Summarizing
     subnets in a Clos topology results in route black-holing under a
!    single link failure (e.g., between Tier-2 and Tier-3 devices) and
     hence must be avoided.  The use of peer links within the same tier to
     resolve the black-holing problem by providing "bypass paths" is
     undesirable due to O(N^2) complexity of the peering mesh and waste of
     ports on the devices.  An alternative to the full-mesh of peer-links
!    would be using a simpler bypass topology, e.g., a "ring" as described
     in [FB4POST], but such a topology adds extra hops and has very
!    limited bisectional bandwidth. Additionally requiring special tweaks
to
  
  
  
***************
*** 956,963 ****
  
     make BGP routing work - such as possibly splitting every device into
     an ASN on its own.  Later in this document, Section 8.2 introduces a
!    less intrusive method for performing a limited form route
!    summarization in Clos networks and discusses it's associated trade-
     offs.
  
  5.2.4.  External Connectivity
--- 956,963 ----
  
     make BGP routing work - such as possibly splitting every device into
     an ASN on its own.  Later in this document, Section 8.2 introduces a
!    less intrusive method for performing a limited form of route
!    summarization in Clos networks and discusses its associated trade-
     offs.
  
  5.2.4.  External Connectivity
***************
*** 972,985 ****
     document.  These devices have to perform a few special functions:
  
     o  Hide network topology information when advertising paths to WAN
!       routers, i.e. remove Private Use ASNs [RFC6996] from the AS_PATH
        attribute.  This is typically done to avoid ASN number collisions
        between different data centers and also to provide a uniform
        AS_PATH length to the WAN for purposes of WAN ECMP to Anycast
        prefixes originated in the topology.  An implementation specific
        BGP feature typically called "Remove Private AS" is commonly used
        to accomplish this.  Depending on implementation, the feature
!       should strip a contiguous sequence of Private Use ASNs found in
        AS_PATH attribute prior to advertising the path to a neighbor.
        This assumes that all ASNs used for intra data center numbering
        are from the Private Use ranges.  The process for stripping the
--- 972,985 ----
     document.  These devices have to perform a few special functions:
  
     o  Hide network topology information when advertising paths to WAN
!       routers, i.e., remove Private Use ASNs [RFC6996] from the AS_PATH
        attribute.  This is typically done to avoid ASN number collisions
        between different data centers and also to provide a uniform
        AS_PATH length to the WAN for purposes of WAN ECMP to Anycast
        prefixes originated in the topology.  An implementation specific
        BGP feature typically called "Remove Private AS" is commonly used
        to accomplish this.  Depending on implementation, the feature
!       should strip a contiguous sequence of Private Use ASNs found in an
        AS_PATH attribute prior to advertising the path to a neighbor.
        This assumes that all ASNs used for intra data center numbering
        are from the Private Use ranges.  The process for stripping the
***************
*** 998,1005 ****
        to the WAN Routers upstream, to provide resistance to a single-
        link failure causing the black-holing of traffic.  To prevent
        black-holing in the situation when all of the EBGP sessions to the
!       WAN routers fail simultaneously on a given device it is more
!       desirable to take the "relaying" approach rather than introducing
        the default route via complicated conditional route origination
        schemes provided by some implementations [CONDITIONALROUTE].
  
--- 998,1005 ----
        to the WAN Routers upstream, to provide resistance to a single-
        link failure causing the black-holing of traffic.  To prevent
        black-holing in the situation when all of the EBGP sessions to the
!       WAN routers fail simultaneously on a given device, it is more
!       desirable to readvertise the default route rather than originating
        the default route via complicated conditional route origination
        schemes provided by some implementations [CONDITIONALROUTE].
  
***************
*** 1017,1023 ****
     prefixes originated from within the data center in a fully routed
     network design.  For example, a network with 2000 Tier-3 devices will
     have at least 2000 servers subnets advertised into BGP, along with
!    the infrastructure or other prefixes.  However, as discussed before,
     the proposed network design does not allow for route summarization
     due to the lack of peer links inside every tier.
  
--- 1017,1023 ----
     prefixes originated from within the data center in a fully routed
     network design.  For example, a network with 2000 Tier-3 devices will
     have at least 2000 servers subnets advertised into BGP, along with
!    the infrastructure and link prefixes.  However, as discussed before,
     the proposed network design does not allow for route summarization
     due to the lack of peer links inside every tier.
  
***************
*** 1028,1037 ****
     o  Interconnect the Border Routers using a full-mesh of physical
        links or using any other "peer-mesh" topology, such as ring or
        hub-and-spoke.  Configure BGP accordingly on all Border Leafs to
!       exchange network reachability information - e.g. by adding a mesh
        of IBGP sessions.  The interconnecting peer links need to be
        appropriately sized for traffic that will be present in the case
!       of a device or link failure underneath the Border Routers.
  
     o  Tier-1 devices may have additional physical links provisioned
        toward the Border Routers (which are Tier-2 devices from the
--- 1028,1037 ----
     o  Interconnect the Border Routers using a full-mesh of physical
        links or using any other "peer-mesh" topology, such as ring or
        hub-and-spoke.  Configure BGP accordingly on all Border Leafs to
!       exchange network reachability information, e.g., by adding a mesh
        of IBGP sessions.  The interconnecting peer links need to be
        appropriately sized for traffic that will be present in the case
!       of a device or link failure in the mesh connecting the Border
Routers.
  
     o  Tier-1 devices may have additional physical links provisioned
        toward the Border Routers (which are Tier-2 devices from the
***************
*** 1043,1049 ****
        device compared with the other devices in the Clos.  This also
        reduces the number of ports available to "regular" Tier-2 switches
        and hence the number of clusters that could be interconnected via
!       Tier-1 layer.
  
     If any of the above options are implemented, it is possible to
     perform route summarization at the Border Routers toward the WAN
--- 1043,1049 ----
        device compared with the other devices in the Clos.  This also
        reduces the number of ports available to "regular" Tier-2 switches
        and hence the number of clusters that could be interconnected via
!       the Tier-1 layer.
  
     If any of the above options are implemented, it is possible to
     perform route summarization at the Border Routers toward the WAN
***************
*** 1071,1079 ****
     ECMP is the fundamental load sharing mechanism used by a Clos
     topology.  Effectively, every lower-tier device will use all of its
     directly attached upper-tier devices to load share traffic destined
!    to the same IP prefix.  Number of ECMP paths between any two Tier-3
     devices in Clos topology equals to the number of the devices in the
!    middle stage (Tier-1).  For example, Figure 5 illustrates the
     topology where Tier-3 device A has four paths to reach servers X and
     Y, via Tier-2 devices B and C and then Tier-1 devices 1, 2, 3, and 4
     respectively.
--- 1071,1079 ----
     ECMP is the fundamental load sharing mechanism used by a Clos
     topology.  Effectively, every lower-tier device will use all of its
     directly attached upper-tier devices to load share traffic destined
!    to the same IP prefix.  The number of ECMP paths between any two
Tier-3
     devices in Clos topology equals to the number of the devices in the
!    middle stage (Tier-1).  For example, Figure 5 illustrates a
     topology where Tier-3 device A has four paths to reach servers X and
     Y, via Tier-2 devices B and C and then Tier-1 devices 1, 2, 3, and 4
     respectively.
***************
*** 1105,1116 ****
  
     The ECMP requirement implies that the BGP implementation must support
     multipath fan-out for up to the maximum number of devices directly
!    attached at any point in the topology in upstream or downstream
     direction.  Normally, this number does not exceed half of the ports
     found on a device in the topology.  For example, an ECMP fan-out of
     32 would be required when building a Clos network using 64-port
     devices.  The Border Routers may need to have wider fan-out to be
!    able to connect to multitude of Tier-1 devices if route summarization
     at Border Router level is implemented as described in Section 5.2.5.
     If a device's hardware does not support wider ECMP, logical link-
     grouping (link-aggregation at layer 2) could be used to provide
--- 1105,1116 ----
  
     The ECMP requirement implies that the BGP implementation must support
     multipath fan-out for up to the maximum number of devices directly
!    attached at any point in the topology in the upstream or downstream
     direction.  Normally, this number does not exceed half of the ports
     found on a device in the topology.  For example, an ECMP fan-out of
     32 would be required when building a Clos network using 64-port
     devices.  The Border Routers may need to have wider fan-out to be
!    able to connect to a multitude of Tier-1 devices if route
summarization
     at Border Router level is implemented as described in Section 5.2.5.
     If a device's hardware does not support wider ECMP, logical link-
     grouping (link-aggregation at layer 2) could be used to provide
***************
*** 1122,1131 ****
  Internet-Draft    draft-ietf-rtgwg-bgp-routing-large-dc       March 2016
  
  
!    "hierarchical" ECMP (Layer 3 ECMP followed by Layer 2 ECMP) to
     compensate for fan-out limitations.  Such approach, however,
     increases the risk of flow polarization, as less entropy will be
!    available to the second stage of ECMP.
  
     Most BGP implementations declare paths to be equal from an ECMP
     perspective if they match up to and including step (e) in
--- 1122,1131 ----
  Internet-Draft    draft-ietf-rtgwg-bgp-routing-large-dc       March 2016
  
  
!    "hierarchical" ECMP (Layer 3 ECMP coupled with Layer 2 ECMP) to
     compensate for fan-out limitations.  Such approach, however,
     increases the risk of flow polarization, as less entropy will be
!    available at the second stage of ECMP.
  
     Most BGP implementations declare paths to be equal from an ECMP
     perspective if they match up to and including step (e) in
***************
*** 1148,1154 ****
     perspective of other devices, such a prefix would have BGP paths with
     different AS_PATH attribute values, while having the same AS_PATH
     attribute lengths.  Therefore, BGP implementations must support load
!    sharing over above-mentioned paths.  This feature is sometimes known
     as "multipath relax" or "multipath multiple-as" and effectively
     allows for ECMP to be done across different neighboring ASNs if all
     other attributes are equal as already described in the previous
--- 1148,1154 ----
     perspective of other devices, such a prefix would have BGP paths with
     different AS_PATH attribute values, while having the same AS_PATH
     attribute lengths.  Therefore, BGP implementations must support load
!    sharing over the above-mentioned paths.  This feature is sometimes
known
     as "multipath relax" or "multipath multiple-as" and effectively
     allows for ECMP to be done across different neighboring ASNs if all
     other attributes are equal as already described in the previous
***************
*** 1182,1199 ****
  
     It is often desirable to have the hashing function used for ECMP to
     be consistent (see [CONS-HASH]), to minimize the impact on flow to
!    next-hop affinity changes when a next-hop is added or removed to ECMP
     group.  This could be used if the network device is used as a load
     balancer, mapping flows toward multiple destinations - in this case,
!    losing or adding a destination will not have detrimental effect of
     currently established flows.  One particular recommendation on
     implementing consistent hashing is provided in [RFC2992], though
     other implementations are possible.  This functionality could be
     naturally combined with weighted ECMP, with the impact of the next-
     hop changes being proportional to the weight of the given next-hop.
     The downside of consistent hashing is increased load on hardware
!    resource utilization, as typically more space is required to
!    implement a consistent-hashing region.
  
  7.  Routing Convergence Properties
  
--- 1182,1199 ----
  
     It is often desirable to have the hashing function used for ECMP to
     be consistent (see [CONS-HASH]), to minimize the impact on flow to
!    next-hop affinity changes when a next-hop is added or removed to an
ECMP
     group.  This could be used if the network device is used as a load
     balancer, mapping flows toward multiple destinations - in this case,
!    losing or adding a destination will not have a detrimental effect on
     currently established flows.  One particular recommendation on
     implementing consistent hashing is provided in [RFC2992], though
     other implementations are possible.  This functionality could be
     naturally combined with weighted ECMP, with the impact of the next-
     hop changes being proportional to the weight of the given next-hop.
     The downside of consistent hashing is increased load on hardware
!    resource utilization, as typically more resources (e.g., TCAM space)
!    are required to implement a consistent-hashing function.
  
  7.  Routing Convergence Properties
  
***************
*** 1209,1224 ****
     driven mechanism to obtain updates on IGP state changes.  The
     proposed routing design does not use an IGP, so the remaining
     mechanisms that could be used for fault detection are BGP keep-alive
!    process (or any other type of keep-alive mechanism) and link-failure
     triggers.
  
     Relying solely on BGP keep-alive packets may result in high
!    convergence delays, in the order of multiple seconds (on many BGP
     implementations the minimum configurable BGP hold timer value is
     three seconds).  However, many BGP implementations can shut down
     local EBGP peering sessions in response to the "link down" event for
     the outgoing interface used for BGP peering.  This feature is
!    sometimes called as "fast fallover".  Since links in modern data
     centers are predominantly point-to-point fiber connections, a
     physical interface failure is often detected in milliseconds and
     subsequently triggers a BGP re-convergence.
--- 1209,1224 ----
     driven mechanism to obtain updates on IGP state changes.  The
     proposed routing design does not use an IGP, so the remaining
     mechanisms that could be used for fault detection are BGP keep-alive
!    time-out (or any other type of keep-alive mechanism) and link-failure
     triggers.
  
     Relying solely on BGP keep-alive packets may result in high
!    convergence delays, on the order of multiple seconds (on many BGP
     implementations the minimum configurable BGP hold timer value is
     three seconds).  However, many BGP implementations can shut down
     local EBGP peering sessions in response to the "link down" event for
     the outgoing interface used for BGP peering.  This feature is
!    sometimes called "fast fallover".  Since links in modern data
     centers are predominantly point-to-point fiber connections, a
     physical interface failure is often detected in milliseconds and
     subsequently triggers a BGP re-convergence.
***************
*** 1236,1242 ****
  
     Alternatively, some platforms may support Bidirectional Forwarding
     Detection (BFD) [RFC5880] to allow for sub-second failure detection
!    and fault signaling to the BGP process.  However, use of either of
     these presents additional requirements to vendor software and
     possibly hardware, and may contradict REQ1.  Until recently with
     [RFC7130], BFD also did not allow detection of a single member link
--- 1236,1242 ----
  
     Alternatively, some platforms may support Bidirectional Forwarding
     Detection (BFD) [RFC5880] to allow for sub-second failure detection
!    and fault signaling to the BGP process.  However, the use of either of
     these presents additional requirements to vendor software and
     possibly hardware, and may contradict REQ1.  Until recently with
     [RFC7130], BFD also did not allow detection of a single member link
***************
*** 1245,1251 ****
  
  7.2.  Event Propagation Timing
  
!    In the proposed design the impact of BGP Minimum Route Advertisement
     Interval (MRAI) timer (See section 9.2.1.1 of [RFC4271]) should be
     considered.  Per the standard it is required for BGP implementations
     to space out consecutive BGP UPDATE messages by at least MRAI
--- 1245,1251 ----
  
  7.2.  Event Propagation Timing
  
!    In the proposed design the impact of the BGP Minimum Route
Advertisement
     Interval (MRAI) timer (See section 9.2.1.1 of [RFC4271]) should be
     considered.  Per the standard it is required for BGP implementations
     to space out consecutive BGP UPDATE messages by at least MRAI
***************
*** 1258,1270 ****
     In a Clos topology each EBGP speaker typically has either one path
     (Tier-2 devices don't accept paths from other Tier-2 in the same
     cluster due to same ASN) or N paths for the same prefix, where N is a
!    significantly large number, e.g.  N=32 (the ECMP fan-out to the next
     Tier).  Therefore, if a link fails to another device from which a
!    path is received there is either no backup path at all (e.g. from
     perspective of a Tier-2 switch losing link to a Tier-3 device), or
!    the backup is readily available in BGP Loc-RIB (e.g. from perspective
     of a Tier-2 device losing link to a Tier-1 switch).  In the former
!    case, the BGP withdrawal announcement will propagate un-delayed and
     trigger re-convergence on affected devices.  In the latter case, the
     best-path will be re-evaluated and the local ECMP group corresponding
     to the new next-hop set changed.  If the BGP path was the best-path
--- 1258,1270 ----
     In a Clos topology each EBGP speaker typically has either one path
     (Tier-2 devices don't accept paths from other Tier-2 in the same
     cluster due to same ASN) or N paths for the same prefix, where N is a
!    significantly large number, e.g.,  N=32 (the ECMP fan-out to the next
     Tier).  Therefore, if a link fails to another device from which a
!    path is received there is either no backup path at all (e.g., from the
     perspective of a Tier-2 switch losing link to a Tier-3 device), or
!    the backup is readily available in BGP Loc-RIB (e.g., from perspective
     of a Tier-2 device losing link to a Tier-1 switch).  In the former
!    case, the BGP withdrawal announcement will propagate without delay and
     trigger re-convergence on affected devices.  In the latter case, the
     best-path will be re-evaluated and the local ECMP group corresponding
     to the new next-hop set changed.  If the BGP path was the best-path
***************
*** 1279,1285 ****
     situation when a link between Tier-3 and Tier-2 device fails, the
     Tier-2 device will send BGP UPDATE messages to all upstream Tier-1
     devices, withdrawing the affected prefixes.  The Tier-1 devices, in
!    turn, will relay those messages to all downstream Tier-2 devices
     (except for the originator).  Tier-2 devices other than the one
     originating the UPDATE should then wait for ALL upstream Tier-1
  
--- 1279,1285 ----
     situation when a link between Tier-3 and Tier-2 device fails, the
     Tier-2 device will send BGP UPDATE messages to all upstream Tier-1
     devices, withdrawing the affected prefixes.  The Tier-1 devices, in
!    turn, will relay these messages to all downstream Tier-2 devices
     (except for the originator).  Tier-2 devices other than the one
     originating the UPDATE should then wait for ALL upstream Tier-1
  
***************
*** 1307,1313 ****
     features that vendors include to reduce the control plane impact of
     rapidly flapping prefixes.  However, due to issues described with
     false positives in these implementations especially under such
!    "dispersion" events, it is not recommended to turn this feature on in
     this design.  More background and issues with "route flap dampening"
     and possible implementation changes that could affect this are well
     described in [RFC7196].
--- 1307,1313 ----
     features that vendors include to reduce the control plane impact of
     rapidly flapping prefixes.  However, due to issues described with
     false positives in these implementations especially under such
!    "dispersion" events, it is not recommended to enable this feature in
     this design.  More background and issues with "route flap dampening"
     and possible implementation changes that could affect this are well
     described in [RFC7196].
***************
*** 1316,1324 ****
  
     A network is declared to converge in response to a failure once all
     devices within the failure impact scope are notified of the event and
!    have re-calculated their RIB's and consequently updated their FIB's.
     Larger failure impact scope typically means slower convergence since
!    more devices have to be notified, and additionally results in a less
     stable network.  In this section we describe BGP's advantages over
     link-state routing protocols in reducing failure impact scope for a
     Clos topology.
--- 1316,1324 ----
  
     A network is declared to converge in response to a failure once all
     devices within the failure impact scope are notified of the event and
!    have re-calculated their RIBs and consequently updated their FIBs.
     Larger failure impact scope typically means slower convergence since
!    more devices have to be notified, and results in a less
     stable network.  In this section we describe BGP's advantages over
     link-state routing protocols in reducing failure impact scope for a
     Clos topology.
***************
*** 1327,1335 ****
     the best path from the point of view of the local router is sent to
     neighbors.  As such, some failures are masked if the local node can
     immediately find a backup path and does not have to send any updates
!    further.  Notice that in the worst case ALL devices in a data center
     topology have to either withdraw a prefix completely or update the
!    ECMP groups in the FIB.  However, many failures will not result in
     such a wide impact.  There are two main failure types where impact
     scope is reduced:
  
--- 1327,1335 ----
     the best path from the point of view of the local router is sent to
     neighbors.  As such, some failures are masked if the local node can
     immediately find a backup path and does not have to send any updates
!    further.  Notice that in the worst case, all devices in a data center
     topology have to either withdraw a prefix completely or update the
!    ECMP groups in their FIBs.  However, many failures will not result in
     such a wide impact.  There are two main failure types where impact
     scope is reduced:
  
***************
*** 1357,1367 ****
  
     o  Failure of a Tier-1 device: In this case, all Tier-2 devices
        directly attached to the failed node will have to update their
!       ECMP groups for all IP prefixes from non-local cluster.  The
        Tier-3 devices are once again not involved in the re-convergence
        process, but may receive "implicit withdraws" as described above.
  
!    Even though in case of such failures multiple IP prefixes will have
     to be reprogrammed in the FIB, it is worth noting that ALL of these
     prefixes share a single ECMP group on Tier-2 device.  Therefore, in
     the case of implementations with a hierarchical FIB, only a single
--- 1357,1367 ----
  
     o  Failure of a Tier-1 device: In this case, all Tier-2 devices
        directly attached to the failed node will have to update their
!       ECMP groups for all IP prefixes from a non-local cluster.  The
        Tier-3 devices are once again not involved in the re-convergence
        process, but may receive "implicit withdraws" as described above.
  
!    Even in the case of such failures, multiple IP prefixes will have
     to be reprogrammed in the FIB, it is worth noting that ALL of these
     prefixes share a single ECMP group on Tier-2 device.  Therefore, in
     the case of implementations with a hierarchical FIB, only a single
***************
*** 1375,1381 ****
     possible with the proposed design, since using this technique may
     create routing black-holes as mentioned previously.  Therefore, the
     worst control plane failure impact scope is the network as a whole,
!    for instance in a case of a link failure between Tier-2 and Tier-3
     devices.  The amount of impacted prefixes in this case would be much
     less than in the case of a failure in the upper layers of a Clos
     network topology.  The property of having such large failure scope is
--- 1375,1381 ----
     possible with the proposed design, since using this technique may
     create routing black-holes as mentioned previously.  Therefore, the
     worst control plane failure impact scope is the network as a whole,
!    for instance in thecase of a link failure between Tier-2 and Tier-3
     devices.  The amount of impacted prefixes in this case would be much
     less than in the case of a failure in the upper layers of a Clos
     network topology.  The property of having such large failure scope is
***************
*** 1384,1397 ****
  
  7.5.  Routing Micro-Loops
  
!    When a downstream device, e.g.  Tier-2 device, loses all paths for a
     prefix, it normally has the default route pointing toward the
     upstream device, in this case the Tier-1 device.  As a result, it is
!    possible to get in the situation when Tier-2 switch loses a prefix,
!    but Tier-1 switch still has the path pointing to the Tier-2 device,
!    which results in transient micro-loop, since Tier-1 switch will keep
     passing packets to the affected prefix back to Tier-2 device, and
!    Tier-2 will bounce it back again using the default route.  This
     micro-loop will last for the duration of time it takes the upstream
     device to fully update its forwarding tables.
  
--- 1384,1397 ----
  
  7.5.  Routing Micro-Loops
  
!    When a downstream device, e.g.,  Tier-2 device, loses all paths for a
     prefix, it normally has the default route pointing toward the
     upstream device, in this case the Tier-1 device.  As a result, it is
!    possible to get in the situation where a Tier-2 switch loses a prefix,
!    but a Tier-1 switch still has the path pointing to the Tier-2 device,
!    which results in transient micro-loop, since the Tier-1 switch will
keep
     passing packets to the affected prefix back to Tier-2 device, and
!    the Tier-2 will bounce it back again using the default route.  This
     micro-loop will last for the duration of time it takes the upstream
     device to fully update its forwarding tables.
  
***************
*** 1402,1408 ****
  Internet-Draft    draft-ietf-rtgwg-bgp-routing-large-dc       March 2016
  
  
!    To minimize impact of the micro-loops, Tier-2 and Tier-1 switches can
     be configured with static "discard" or "null" routes that will be
     more specific than the default route for prefixes missing during
     network convergence.  For Tier-2 switches, the discard route should
--- 1402,1408 ----
  Internet-Draft    draft-ietf-rtgwg-bgp-routing-large-dc       March 2016
  
  
!    To minimize the impact of such micro-loops, Tier-2 and Tier-1
switches can
     be configured with static "discard" or "null" routes that will be
     more specific than the default route for prefixes missing during
     network convergence.  For Tier-2 switches, the discard route should
***************
*** 1417,1423 ****
  
  8.1.  Third-party Route Injection
  
!    BGP allows for a "third-party", i.e. directly attached, BGP speaker
     to inject routes anywhere in the network topology, meeting REQ5.
     This can be achieved by peering via a multihop BGP session with some
     or even all devices in the topology.  Furthermore, BGP diverse path
--- 1417,1423 ----
  
  8.1.  Third-party Route Injection
  
!    BGP allows for a "third-party", i.e., directly attached, BGP speaker
     to inject routes anywhere in the network topology, meeting REQ5.
     This can be achieved by peering via a multihop BGP session with some
     or even all devices in the topology.  Furthermore, BGP diverse path
***************
*** 1427,1433 ****
     implementation.  Unfortunately, in many implementations ADD-PATH has
     been found to only support IBGP properly due to the use cases it was
     originally optimized for, which limits the "third-party" peering to
!    IBGP only, if the feature is used.
  
     To implement route injection in the proposed design, a third-party
     BGP speaker may peer with Tier-3 and Tier-1 switches, injecting the
--- 1427,1433 ----
     implementation.  Unfortunately, in many implementations ADD-PATH has
     been found to only support IBGP properly due to the use cases it was
     originally optimized for, which limits the "third-party" peering to
!    IBGP only.
  
     To implement route injection in the proposed design, a third-party
     BGP speaker may peer with Tier-3 and Tier-1 switches, injecting the
***************
*** 1442,1453 ****
     As mentioned previously, route summarization is not possible within
     the proposed Clos topology since it makes the network susceptible to
     route black-holing under single link failures.  The main problem is
!    the limited number of redundant paths between network elements, e.g.
     there is only a single path between any pair of Tier-1 and Tier-3
     devices.  However, some operators may find route aggregation
     desirable to improve control plane stability.
  
!    If planning on using any technique to summarize within the topology
     modeling of the routing behavior and potential for black-holing
     should be done not only for single or multiple link failures, but
  
--- 1442,1453 ----
     As mentioned previously, route summarization is not possible within
     the proposed Clos topology since it makes the network susceptible to
     route black-holing under single link failures.  The main problem is
!    the limited number of redundant paths between network elements, e.g.,
     there is only a single path between any pair of Tier-1 and Tier-3
     devices.  However, some operators may find route aggregation
     desirable to improve control plane stability.
  
!    If any technique to summarize within the topology is planned,
     modeling of the routing behavior and potential for black-holing
     should be done not only for single or multiple link failures, but
  
***************
*** 1458,1468 ****
  Internet-Draft    draft-ietf-rtgwg-bgp-routing-large-dc       March 2016
  
  
!    also fiber pathway failures or optical domain failures if the
     topology extends beyond a physical location.  Simple modeling can be
     done by checking the reachability on devices doing summarization
     under the condition of a link or pathway failure between a set of
!    devices in every tier as well as to the WAN routers if external
     connectivity is present.
  
     Route summarization would be possible with a small modification to
--- 1458,1468 ----
  Internet-Draft    draft-ietf-rtgwg-bgp-routing-large-dc       March 2016
  
  
!    also fiber pathway failures or optical domain failures when the
     topology extends beyond a physical location.  Simple modeling can be
     done by checking the reachability on devices doing summarization
     under the condition of a link or pathway failure between a set of
!    devices in every tier as well as to the WAN routers when external
     connectivity is present.
  
     Route summarization would be possible with a small modification to
***************
*** 1519,1544 ****
     cluster from Tier-2 devices since each of them has only a single path
     down to this prefix.  It would require dual-homed servers to
     accomplish that.  Also note that this design is only resilient to
!    single link failure.  It is possible for a double link failure to
     isolate a Tier-2 device from all paths toward a specific Tier-3
     device, thus causing a routing black-hole.
  
!    A result of the proposed topology modification would be reduction of
     Tier-1 devices port capacity.  This limits the maximum number of
     attached Tier-2 devices and therefore will limit the maximum DC
     network size.  A larger network would require different Tier-1
     devices that have higher port density to implement this change.
  
     Another problem is traffic re-balancing under link failures.  Since
!    three are two paths from Tier-1 to Tier-3, a failure of the link
     between Tier-1 and Tier-2 switch would result in all traffic that was
     taking the failed link to switch to the remaining path.  This will
!    result in doubling of link utilization on the remaining link.
  
  8.2.2.  Simple Virtual Aggregation
  
     A completely different approach to route summarization is possible,
!    provided that the main goal is to reduce the FIB pressure, while
     allowing the control plane to disseminate full routing information.
     Firstly, it could be easily noted that in many cases multiple
     prefixes, some of which are less specific, share the same set of the
--- 1519,1544 ----
     cluster from Tier-2 devices since each of them has only a single path
     down to this prefix.  It would require dual-homed servers to
     accomplish that.  Also note that this design is only resilient to
!    single link failures.  It is possible for a double link failure to
     isolate a Tier-2 device from all paths toward a specific Tier-3
     device, thus causing a routing black-hole.
  
!    A result of the proposed topology modification would be a reduction of
     Tier-1 devices port capacity.  This limits the maximum number of
     attached Tier-2 devices and therefore will limit the maximum DC
     network size.  A larger network would require different Tier-1
     devices that have higher port density to implement this change.
  
     Another problem is traffic re-balancing under link failures.  Since
!    there are two paths from Tier-1 to Tier-3, a failure of the link
     between Tier-1 and Tier-2 switch would result in all traffic that was
     taking the failed link to switch to the remaining path.  This will
!    result in doubling the link utilization of the remaining link.
  
  8.2.2.  Simple Virtual Aggregation
  
     A completely different approach to route summarization is possible,
!    provided that the main goal is to reduce the FIB size, while
     allowing the control plane to disseminate full routing information.
     Firstly, it could be easily noted that in many cases multiple
     prefixes, some of which are less specific, share the same set of the
***************
*** 1550,1563 ****
     [RFC6769] and only install the least specific route in the FIB,
     ignoring more specific routes if they share the same next-hop set.
     For example, under normal network conditions, only the default route
!    need to be programmed into FIB.
  
     Furthermore, if the Tier-2 devices are configured with summary
!    prefixes covering all of their attached Tier-3 device's prefixes the
     same logic could be applied in Tier-1 devices as well, and, by
     induction to Tier-2/Tier-3 switches in different clusters.  These
     summary routes should still allow for more specific prefixes to leak
!    to Tier-1 devices, to enable for detection of mismatches in the next-
     hop sets if a particular link fails, changing the next-hop set for a
     specific prefix.
  
--- 1550,1563 ----
     [RFC6769] and only install the least specific route in the FIB,
     ignoring more specific routes if they share the same next-hop set.
     For example, under normal network conditions, only the default route
!    needs to be programmed into FIB.
  
     Furthermore, if the Tier-2 devices are configured with summary
!    prefixes covering all of their attached Tier-3 device's prefixes, the
     same logic could be applied in Tier-1 devices as well, and, by
     induction to Tier-2/Tier-3 switches in different clusters.  These
     summary routes should still allow for more specific prefixes to leak
!    to Tier-1 devices, to enable detection of mismatches in the next-
     hop sets if a particular link fails, changing the next-hop set for a
     specific prefix.
  
***************
*** 1571,1584 ****
  
  
     Re-stating once again, this technique does not reduce the amount of
!    control plane state (i.e.  BGP UPDATEs/BGP LocRIB sizing), but only
!    allows for more efficient FIB utilization, by spotting more specific
!    prefixes that share their next-hops with less specifics.
  
  8.3.  ICMP Unreachable Message Masquerading
  
     This section discusses some operational aspects of not advertising
!    point-to-point link subnets into BGP, as previously outlined as an
     option in Section 5.2.3.  The operational impact of this decision
     could be seen when using the well-known "traceroute" tool.
     Specifically, IP addresses displayed by the tool will be the link's
--- 1571,1585 ----
  
  
     Re-stating once again, this technique does not reduce the amount of
!    control plane state (i.e.,  BGP UPDATEs/BGP Loc-RIB size), but only
!    allows for more efficient FIB utilization, by detecting more specific
!    prefixes that share their next-hop set with a subsuming less specific
!    prefix.
  
  8.3.  ICMP Unreachable Message Masquerading
  
     This section discusses some operational aspects of not advertising
!    point-to-point link subnets into BGP, as previously identified as an
     option in Section 5.2.3.  The operational impact of this decision
     could be seen when using the well-known "traceroute" tool.
     Specifically, IP addresses displayed by the tool will be the link's
***************
*** 1587,1605 ****
     complicated.
  
     One way to overcome this limitation is by using the DNS subsystem to
!    create the "reverse" entries for the IP addresses of the same device
!    pointing to the same name.  The connectivity then can be made by
!    resolving this name to the "primary" IP address of the devices, e.g.
     its Loopback interface, which is always advertised into BGP.
     However, this creates a dependency on the DNS subsystem, which may be
     unavailable during an outage.
  
     Another option is to make the network device perform IP address
     masquerading, that is rewriting the source IP addresses of the
!    appropriate ICMP messages sent off of the device with the "primary"
     IP address of the device.  Specifically, the ICMP Destination
     Unreachable Message (type 3) codes 3 (port unreachable) and ICMP Time
!    Exceeded (type 11) code 0, which are involved in proper working of
     the "traceroute" tool.  With this modification, the "traceroute"
     probes sent to the devices will always be sent back with the
     "primary" IP address as the source, allowing the operator to discover
--- 1588,1606 ----
     complicated.
  
     One way to overcome this limitation is by using the DNS subsystem to
!    create the "reverse" entries for these point-to-point IP addresses
pointing
!    to a the same name as the loopback address.  The connectivity then
can be made by
!    resolving this name to the "primary" IP address of the devices, e.g.,
     its Loopback interface, which is always advertised into BGP.
     However, this creates a dependency on the DNS subsystem, which may be
     unavailable during an outage.
  
     Another option is to make the network device perform IP address
     masquerading, that is rewriting the source IP addresses of the
!    appropriate ICMP messages sent by the device with the "primary"
     IP address of the device.  Specifically, the ICMP Destination
     Unreachable Message (type 3) codes 3 (port unreachable) and ICMP Time
!    Exceeded (type 11) code 0, which are required for correct operation of
     the "traceroute" tool.  With this modification, the "traceroute"
     probes sent to the devices will always be sent back with the
     "primary" IP address as the source, allowing the operator to discover

Thanks,
Acee

_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
Jeff Tantsura | 21 Apr 09:42 2016
Picon

WGLC on draft-ietf-rtgwg-rlfa-node-protection

Dear RTGWG,
 
The authors of draft-ietf-rtgwg-rlfa-node-protection have told us that the
draft is ready for working group last call (WGLC).
 
Before we do the WGLC we want to do an IPR poll on the document.
 
This mail starts that IPR poll.
 
Are you aware of any IPR that applies to draft-ietf-rtgwg-rlfa-node-protection?
 
If so, has this IPR been disclosed in compliance with IETF IPR rules
(see RFCs 3979, 4879, 3669 and 5378 for more details).
 
Currently there are two IPR disclosures on draft-psarkar-rtgwg-rlfa-node-protection
(which was the pre-working group version of draft-ietf-rtgwg-rlfa-node-protection):

If you are listed as a document author or contributor please respond to
this email regardless of whether or not you are aware of any relevant
IPR. *The response needs to be sent to the MPLS wg mailing list.* The
document will not advance to the next stage until a response has been
received from each author and contributor.
 
If you are on the RTGWG email list but are not listed as an author or
contributor, then please explicitly respond only if you are aware of any
IPR that has not yet been disclosed in conformance with IETF rules.
 
Thanks, 
Jeff and Chris
_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
Jeff Tantsura | 21 Apr 09:40 2016
Picon

change of affiliation

Dear RTGWG,

I have decided to leave Ericsson and would like to notify you about the change.
I’ll update you with my new affiliation after I have decided what it is going to be.

Thanks,
Jeff 
_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
bruno.decraene | 20 Apr 15:53 2016

RtgDir review: draft-ietf-rtgwg-bgp-pic-00

Hello,

I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.

Document: draft-ietf-rtgwg-bgp-pic-00
Reviewer: Bruno Decraene
IETF LC End Date: “QA review” pre WG LC
Intended Status: Informational

Summary:

I have some minor concerns about this document that I think should be resolved before publication.

Comments:

- Document is interesting. Document is relatively clear but sometime it feels like there is a little room for some reformulation/edition to improve fluidity. In particular, the learning curve is a bit steep at the beginning of the doc as most of the concepts are introduced in 3 pages (pages 4-6) in the form of a list of terminology and a pseudo code. I would find useful to have an overview section just after the introduction, with a high level view of the solution with a limited number of new terms.

- The text feels like authoritative, while probably many terms are implementation specific. A priori, I would not expect all implementation of BGP PIC to use the same terms, and possibly not the same data structure. May be the text could be generalized to cover multiple implementations; or modified to describe a generalized concept (i.e. data-structure designed to share as much data as possible between elements, at the cost of additional indirections); or the document could state that it describes a specific implementation with implementations specific terminology, data structure, and specifics. Or a combination of both (e.g. adding a section being both generalized and describing the concept, and then the existing sections after stating that they are specific to one implementation).

 

Minor Issues:

 

- I find figure 2 very useful to understand the data-structure. I would move it sooner in the doc, somewhere before §2.2. (with its subsequent text below) e.g. a new §2.2 "FIB data-structure"

It would need to be generalized i.e. example non-specific. I could think of:

 

IP Leaf:      Pathlist:       IP Leaf:                Pathlist:

--------      ---------       -------                 --------

BGP NLRI ---> BGP NH1   ----> IGP IP1 (BGP NH1)  ---> IGP NH1, I1  ---> Adjacency1

              BGP NHi   --...                         IGP NHi, Ii  --..

               |                                         |

               |                                         |

                |                                         |

                v                                         v

          OutLabel Array:                           OutLabel Array:

          --------------                            --------------

          L (NLRI, NH1)                             L (IP1, NH1)

          L (NLRI, NHi)                             L (IP1, NHi)

                                                

 

- Figure 1 could be enhanced with IGP-NH1, IGP-NH2, I1 and I2.

- Example 3 does not use the same naming convention than examples 1 and 2, this make it harder to follow for a priori no reason. e.g. VPN labels are named VPN-L11 in examples 1 and 2, but are named VPN-PE21(P1) in exmaple 3; transport labels are named LDP-L12 in exmaples 1 and 2, but LASBR11(PE22) and L11 in figure 3.

- §2.3.3

"The local labels of the next hops".

 - All labels are locally assigned. So what do you mean by "local"

- "next-hop" sometimes refers to IGP/connected next-hop (a priori the case here) and sometimes to BGP next-hop. I find it hard to follow. I rather use a different name (e.g; connected next-hop vs BGP next-hop)

- §3

"the hashing at the BGP level yields path 0 while the hashing at the IGP level yields path 1. In that case, the packet will be sent out of interface I1 with the label stack "LDP-L12,VPN-L21".

Does not seem to match my understanding. For "LDP-L12,VPN-L21" I would assume BGP used path index 1 and IGP used path index 0.

 

IMHO:

OLD: "Hence ASBR22 swaps "LASBR22(PE22)" with the LDP/SR label of PE22, pushes the label of the next-hop towards PE22 in domain 2, and sends the packet along the shortest path towards PE22."

NEW: "Hence ASBR22 swaps "LASBR22(PE22)" with the LDP/SR label for PE22 advertised by the next-hop towards PE22 in domain 2, and sends the packet along the shortest path towards PE22."

(in all cases "swaps" then "pushes" would increase the label stack by 1, which is not the case.)

 

§4.1

"the useable paths in the loadinfo"

loadinfo is a proprietary FIB datastructure which has not been introduced/defined. You need to either remove that term (if possible) or define it somewhere.

 

"Hence traffic restoration occurs within the time frame of IGP convergence,"

agree.

..."and, for local link failure, within the timeframe of local detection. Thus it is possible to achieve sub-50 msec convergence as described in [10] for local link failure"

IMO, this is restricted to specific cases. e.g. external (eBGP) link failure, ECMP case, possibly IP FRR.  So possibly

OLD: for local link failure, within the timeframe of local detection. Thus it is possible to achieve sub-50 msec convergence as described in [10] for local link failure

NEW: for local link failure, assuming a backup path has been precomputed, within the timeframe of local detection (e.g. 50ms). Example of solutions precomputing a backup path are IP FRR [LFA], [RLFA], [MRT], [TI-LFA] or eBGP path having a backup path [10].

 

§4

I would find useful to indicate, for each type of failure, the number of data-structure that need to be updated.

---

§4.2.2

"To avoid loops, ePE2 MUST treat any core facing path as a backup

      path, otherwise ePE2 may redirect traffic arriving from the core

      back to ePE1 causing a loop."

                 

Looks a bit under-described to me. Could you please elaborate a bit? In particular:

- if 2 PE (PE1, PE2) are connected in U to 2 P (P1, P2)     (P1-PE1-PE2-P2), PE1 being nominal and PE2 only used in backup, in the nominal situation, if the core network sends the trafic to PE1 via PE2 (used as a P/transit), how does PE2 know that it must send this traffic to PE1? (rather than CE2)

- this behavior looks like an additional specific feature. How doew ePE1 knows that ePE2 have this feature?

---

§4.3

"  Hence if the platform supports the "unflattened" forwarding chain,

   then a single pathlist needs to be updated while if the platform

   supports a shallower forwarding chain, then two pathlists need to be

   updated."

IINM "single"  and "two" pathlist applies to the specific example. In this last sentence/summary, I'd prefer a more general statement. A priori, without digging too much in this most complex use case, it seems like :s/single/o(1)  :s/two/o(PE) . The former looks close (single vs o(1)) but IMHO there is a significant difference between 2 and o(PE) (i.e. 100s)

---

§5.1

Good paragraph. It's quite clear that the convergence time does not depend on the number of BGP prefixes, which is good. For the benefit of the reader, it would be even more interesting if, for each type of failure, the text could indicate on what it depends. e.g.  o(1), o(connected interfaces), o(PE), o(PEnominal*PEbackup)....  

--

§7

"No additional security risk is introduced by using the mechanisms proposed in this document"

In general, with such a sentence, it's difficult to evaluate whether the authors have very quickly evaluated the risk or if this evaluation has been performed in details. So in general, some more text detailing which aspects have been evaluated is interesting for the reader (yet painful for the authors).

As the document describe an internal box behavior, this is difficult to evaluate and discuss. But from a bad experience, I fear that there may be an impact. Indeed, with such structure, the FIB structure/memory is typically different between BGP prefixes and IGP prefixes. In general, the implementation is designed to support the "right" numbers of both. But assuming an accident or an attack, the numbers may not be "right". e.g. one upon a time, someone has redistributed the BGP table into the IGP. In this case, the total number of IP prefixes in the FIB is exactly the same. But as the data structure used in the FIB was different between BGP and IGP prefixes, the FIB ran out of memory and the line card crashed (well actually only the IP FIB, so IS-IS hello packet were still correctly sent and forwarded. As a result, traffic was permanently black holed)

---

§ 9

OLD: that allows achieving prefix independent convergence

NEW: that allows achieving BGP prefixes independent convergence

 

(it's still depend on the number of IGP prefixes and/or BGP pathlist)

 

Nits:

 

Abstract

"via more than one path."

In this 1rst sentence, it's not clear what path really means. (e.g cf the terminology section where you have more than one). I guess that you mean "BGP path". (as there are also typically multiple IGP path to reach each BGP Next Hop)

 

"The objective is achieved through organizing the forwarding chains"

"chain" does not self self explicit to me. what about :s/chains/data structure"

 

"complete transparency"

what do you mean? transparency to what / from who?

§1

OLD: to allow for more than one path for a given prefix

NEW: to allow for BGP to advertise more than one path for a given prefix

 

OLD: Another more common and widely deployed scenario is L3VPN with multi-homed VPN sites

NEW: Another more common and widely deployed scenario is L3VPN with multi-homed VPN sites with unique Route Distinguisher.

 

---

§1.2

"Pathlist: It is an array of paths"

"OutLabel-Array: The OutLabel-Array is a list of one or more outgoing labels "

 

So a list is defined as an array and the array is defined as a list :-).

What about using the same term, e.g. a list?

--

The OutLabel-Array is a list of one or more

      outgoing labels and/or label actions where each label or label

      action has 1-to-1 correspondence to a path in the pathlist. It

      is possible that the number of entries in the OutLabel-array is

      different from the number of paths in the pathlist and the ith

      Outlabel-Array entry is associated with the path whose path-

      index is "i".

                 

- I don't see how one can have a 1-to-1 correspondance if the number of elements is not the same.

- Last sentence could be splitted in 2.

--             

Since the term ingres PE is defined, you could also detine the term egress PE. Possibly in the same sentence.

OLD: "Ingress PE, "iPE": It is a BGP speaker that learns about a

      prefix through another IBGP peer and chooses that IBGP peer as

      the next-hop for the prefix

                 

NEW:      "Ingress PE, "iPE": It is a BGP speaker that learns about a

      prefix through a IBGP peer and chooses an egress PE as the next-hop for the prefix.

                 

As a side note, the previous definition assume that there were no Route Relfector (the iBGP peer is the BGP Next Hop)                 

--

§2.3

Figure 1 represents a VPN network with 3 PE and a CE. In this context, "VPN-P1" sounds a bit like a P router. What about :s/VPN-P1/VPN-IP1  ? Same comment for IGP-P1.

--

§2.3.2

OLD: ePE2 constructs the forwarding chain depicted in Figure 1

NEW: ePE2 constructs the forwarding chain depicted in Figure 3

 

OLD: VPL-L11

NEW: VPN-L11

 

§2.3.3

OLD: can reach ASBR1

NEW: can reach ASBR11

 

OLD: The label for advertised by ASBR11 to iPE

NEW: The label advertised by ASBR11 to iPE

 

OLD: The labels for advertised by ASBR12 to iPE

NEW: The labels advertised by ASBR12 to iPE

 

OLD: The labels for advertised to iPE by ASBR11 using BGP-LU

NEW: The labels advertised  by ASBR11 to iPE using BGP-LU

---

§3

 

OLD: Let's applying the above forwarding steps to the example described in Figure 1 Section 2.3.1.

OLD: Let's applying the above forwarding steps to the example described in Figure 2 Section 2.3.1.

 

(somewhat guesssing. But in all cases, there is no figure 1 in section 2.3.1)

 

---

§4.1

IMO

OLD: As soon as the IGP convergence is effective for the BGP nhop entry, the new forwarding state is immediately available to all dependent BGP prefixes.

NEW: As soon as the IGP convergence is effective for a BGP next-hop entry, the new forwarding state is immediately available to all dependent BGP prefixes.

 

more generally

:s/nhop/next-hop

---

§4.3

:s/PE222/PE22

 

Best regards,

Bruno

 

_________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
Acee Lindem (acee | 18 Apr 19:25 2016
Picon

Re: [Netconf] mbj review of draft-ietf-netconf-restconf-server-model-09

Hi Kent, 
I did follow up on this. The reason for “key-chain” in the model name is
that “keychain” is not a well-known compound word. Additionally, Cisco
OSs, use “key chain” and Ericsson uses “key-chain”. I’d be interested in
other thoughts on this.

I did get some negative feedback with respect to adding “routing-“ to the
model name since key chains are used for other non-routing applications as
well. What are your thoughts on this given that you now see the ubiquitous
usage of key chains across vendors?

Thanks,
Acee

On 4/18/16, 12:54 PM, "Kent Watsen" <kwatsen <at> juniper.net> wrote:

>
>I discussed this naming issue with Acee (CC-ed) in the hallway at BA.  He
>said that he used "key-chain" because that is what Cisco/Redback CLI
>uses.  We then searched on "juniper key-chain" and found that JUNOS uses
>"keychain".  I'm not sure if a more exhaustive search has been made.
>
>I think the netconf draft should stick with "keychain" for now.  I'd like
>to see some discussion in the routing area if they might be better off
>using "keychain"...
>
>Kent
>
>
>
>
>
>
>On 4/8/16, 12:12 PM, "t.petch" <ietfc <at> btconnect.com> wrote:
>
>>----- Original Message -----
>>From: "Kent Watsen" <kwatsen <at> juniper.net>
>>To: "Martin Bjorklund" <mbj <at> tail-f.com>; <netconf <at> ietf.org>
>>Sent: Thursday, April 07, 2016 4:15 AM
>>
>>>
>>> Hi Martin,
>>>
>>> Thank you for your review.  Below are my responses:
>>>
>>
>><snip>
>>
>>>
>>> >o  Section 5
>>> >
>>> >  ietf-system-keychain vs. ietf-routing-key-chain
>>> >
>>> >  Is it "keychain" or "key-chain"?
>>>
>>> I've never seen "key-chain" before.  Many OSs (e.g., mac, linux,
>>openbsd, freebsd) have a utility called "keychain".
>>>
>>
>>Kent
>>
>>The IETF is riddled with them e.g.
>>
>>"   The key-chain YANG model groups several keys into a single key
>>chain."
>>
>>in draft-chen-rtgwg-key-table-yang along with 16 other YANG I-Ds that I
>>have seen lately, containing snippets such as
>>
>>     container key-chains {
>>       list key-chain-list {
>>         key "name";
>>         description
>>           "List of key-chains.";
>>         uses key-chain;
>>
>>I think that there should be a consistent spelling across the IETF.
>>
>>Tom Petch
>>
>>
>>
>>
>>
>>>
>>>
>>> >o  General remark.
>>> >
>>> >  Unless it is too much of a burden, I think it would make sense to
>>> >  move the generic tls and ssh grouping models (and keychain) into a
>>> >  separate draft.   It might also be useful with corresponding
>>> >  groupings for ssh/tls clients (which you almost already have).
>>>
>>> This will be discussed in tomorrow's meeting
>>>
>>>
>>> Thanks,
>>> Kent
>>> >
>>> _______________________________________________
>>> Netconf mailing list
>>> Netconf <at> ietf.org
>>> https://www.ietf.org/mailman/listinfo/netconf

>>

_______________________________________________
rtgwg mailing list
rtgwg <at> ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg
Nitish Gupta (nitisgup | 13 Apr 16:58 2016
Picon

Re: New Version Notification for draft-nitish-vrrp-bfd-03.txt

Hi All,

We have submitted the new version of the draft. In this version we have
just updated just the version to avoid Expiration.
We would take care of the comments in few weeks and update another version
with comments taken care.
We apologize that we have not been able to incorporate the comments in
time given by the working group members.

Thanks,
Nitish

On 13/04/16 8:25 pm, "internet-drafts <at> ietf.org" <internet-drafts <at> ietf.org>
wrote:

>
>A new version of I-D, draft-nitish-vrrp-bfd-03.txt
>has been successfully submitted by Nitish Gupta and posted to the
>IETF repository.
>
>Name:		draft-nitish-vrrp-bfd
>Revision:	03
>Title:		Fast failure detection in VRRP with BFD
>Document date:	2016-04-13
>Group:		Individual Submission
>Pages:		10
>URL:            
>https://www.ietf.org/internet-drafts/draft-nitish-vrrp-bfd-03.txt
>Status:         https://datatracker.ietf.org/doc/draft-nitish-vrrp-bfd/
>Htmlized:       https://tools.ietf.org/html/draft-nitish-vrrp-bfd-03
>Diff:           https://www.ietf.org/rfcdiff?url2=draft-nitish-vrrp-bfd-03
>
>Abstract:
>   This document describes how Bidirectional Forwarding Detection (BFD)
>   can be used to support sub-second detection of a Master Router
>   failure in the Virtual Router Redundancy Protocol (VRRP).
>
>                  
>        
>
>
>Please note that it may take a couple of minutes from the time of
>submission
>until the htmlized version and diff are available at tools.ietf.org.
>
>The IETF Secretariat
>
Jeffrey Haas | 8 Apr 15:12 2016

rtgwg key chain - empty authentication

As a followup to mic comment on Friday afternoon of IETF 95, we should consider an explicit code point for
"empty/null" authentication.  This covers the cases where authentication fields in protocols need to be
used, but no mechanism providing authentication is in use.

BFD, however, has an unusual form of this case:
1. We support the *absence* of authentication.
2. We have work to add an explicit authentication code point with no authentication, simply to take
advantage of some sequencing numbers in the authentication field.

-- Jeff

Gmane