SCTP mis indication counting during Fast Recovery

 Hi,

 

I have come to realize that I for the SCTP TLP (SCTP Tail Loss probe additions) and SCTP FR improvements (adoptation of certain RFC6675 nextseg() features for SCTP FR) that we are working on (as presented at IETF 90) have made an assumption on the SCTP FR function which is neither aligned with our

actual SCTP implementation nor with the text of RFC4960 (/RFC4460) as written (section 7.2.4).

 

However in order to better understand the motivation for the text of RFC4960 as written,

I would appreciate very much if the wg would comment on the following formulation of RFC4960 section 7.2.4:

 

If an endpoint is in Fast Recovery and a SACK

   arrives that advances the Cumulative TSN Ack Point, the

   miss indications are incremented for all TSNs reported

   missing in the SACK.

 

Especially on why the text DOES NOT read:

 

If an endpoint is in Fast Recovery and a SACK

   arrives that advances the Cumulative TSN Ack Point, the

   miss indications are incremented for all TSNs still

   missing up to and including the Fast Recovery Exit Point.

 

 

The basic philosophi being that when the cum ack advances, then the newly cum ack’ed TSN has

been assumed lost, has been retransmitted and consequently the mis indication count should go up for

all still missing TSNs up to and including the Fast Recovery Exit Point.

 

It is clear that the above aproach by no means is bullet proof in the sense that it may very well be that the cum ack advances

due to receipt of SACK without this in fact being a result of the receipt by peer of a fast retransmitted TSN, but this is the situation

also with the approach written into RFC4960.

 

Many thanks in advance.

 

BR, Karen

 

 

 

SCTP ndata SCTP-PR priority

Hi Michael,

 

I think that it could be beneficiary in the ndata draft to relate to (describe) how the ndata SS-PRIORITY scheduling may be used together

with the SCTP-PR priority policy to support “accelerated” prioritized delivery in which the priority messages will not only be able to expel lower priority data from the SND buffer but, also be able to outrace lower priority messages (set to be send on lower priority streams) from  a transmission perspective.

 

Apart from the relevance from a usecase perspective then I think that such a description could be relevant also in order to emphasize the difference

in between the message priority of SCTP-PR and the stream priority of ndata.  For example I think that it is particularly be relevant to state explicitly that the priority given to stream does not carry over to (has no relation to) the SCTP-PR message priority or the messages send on the stream.

 

Alternatively – juts for the sort of it - but this I think would be an addition to/a change of SCTP-PR, RFC6458 and ndata-draft – then one could also look to that the default SCTP-PR message priority be inherited from the stream priority SS-PRIORITY or one may ask for a special version of the stream scheduling SS_PRIORITY* in which this be the case. This would alleviate SCTP Users from having to pass also the prinfo down in case the priority in wished managed simply by differentiation on streams. The stream priority has advantages from a receive perspective in that it would be feasible to  implement priority on a stream perspective on the reading side. This is not feasible for the per message prioritization.

 

Thanks.

 

BR, Karen

 

 

 

*****************************************************************************************

Karen Egede Nielsen

Software Architect, Ph.D.

 

Tieto Denmark A/S

R&D, Telecom & Media

Åhave Parkvej 27, 8260 Viby J, DK-Denmark

Direct Phone / Mobile +45 25134336

E-mail: karen.nielsen <at> tieto.com

*****************************************************************************************

www.tieto.com 

 

Please note: The information contained in this message may be legally privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorised use, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank You.

 

 

 

 

 

Gorry Fairhurst | 14 Oct 09:51 2014
Picon
Picon

WGLC draft-ietf-tsvwg-port-use-05 : Working Group Last Call for comments due 29/10/2014

This email announces the start of a working group last call to
confirm publication of draft-ietf-tsvwg-port-use,
"Recommendations for Transport Port Number Uses".  This document has
received significant feedback during the first WGLC review and has now 
been updated in response to this feedback. It is now thought to be ready 
to proceed to be published as a BCP.

Please send any comments to the TSVWG list. We are interesed in feedback 
from any who have read the latest version, including those who commented 
in the first WGLC.

The draft is available at:
https://tools.ietf.org/html/draft-ietf-tsvwg-port-use-05

This last call will run for TWO weeks, ending 29th October.

James, David, and Gorry
(TSVWG Chairs)
tsvwg-chairs <at> ietf.org

Weixinpeng | 8 Oct 09:14 2014

FW: New Version Notification for draft-wei-tsvwg-tunnel-congestion-feedback-03.txt

Hi all,
A new version of network based congestion control draft has been submitted.

In the last Toronto meeting, we presented our simulation result about the impacts of network based
congestion control on end-to-end congestion control and the result seems acceptable to the people.

In this new version, several small issues that people may be interested in are clarified, including what
information would be conveyed from tunnel egress to ingress for congestion control, how tunnel ingress
could use the information to reduce congestion level in tunnel.

So now we want to know whether you have any comments or suggestions for this? 

Best Regards,
Xinpeng

>-----Original Message-----
>From: internet-drafts <at> ietf.org [mailto:internet-drafts <at> ietf.org]
>Sent: Wednesday, October 08, 2014 12:04 PM
>To: Weixinpeng; Lingli Deng; Zhulei (MBB Research); Zhulei (MBB Research);
>Weixinpeng; Lingli Deng
>Subject: New Version Notification for
>draft-wei-tsvwg-tunnel-congestion-feedback-03.txt
>
>
>A new version of I-D, draft-wei-tsvwg-tunnel-congestion-feedback-03.txt
>has been successfully submitted by Xinpeng Wei and posted to the IETF
>repository.
>
>Name:		draft-wei-tsvwg-tunnel-congestion-feedback
>Revision:	03
>Title:		Tunnel Congestion Feedback
>Document date:	2014-10-08
>Group:		Individual Submission
>Pages:		20
>URL:
>http://www.ietf.org/internet-drafts/draft-wei-tsvwg-tunnel-congestion-feedb

>ack-03.txt
>Status:
>https://datatracker.ietf.org/doc/draft-wei-tsvwg-tunnel-congestion-feedback/

>Htmlized:
>http://tools.ietf.org/html/draft-wei-tsvwg-tunnel-congestion-feedback-03

>Diff:
>http://www.ietf.org/rfcdiff?url2=draft-wei-tsvwg-tunnel-congestion-feedback-

>03
>
>Abstract:
>   This document describes a mechanism to calculate congestion of a
>   tunnel segment based on RFC 6040 recommendations, and a feedback
>   protocol by which to send the measured congestion of the tunnel from
>   egress to ingress router. A basic  model for measuring tunnel
>   congestion and feedback is described, and a protocol for carrying the
>   feedback data is outlined.
>
>
>
>
>
>Please note that it may take a couple of minutes from the time of submission
>until the htmlized version and diff are available at tools.ietf.org.
>
>The IETF Secretariat

gorry | 3 Oct 11:26 2014
Picon
Picon

Comments on SCTP draft-ietf-tsvwg-sctp-failover-05

https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-failover-05

As a Chair, I think the above draft seems in good shape from a protocol
perspective, if people see other ways to improve the protocol - it would
be good to know at this stage.

The authors have requested that this be considered as a PS. I have
therefore given this a more critical read - especially carefully looking
at the language, and suggest they consider the comments below and re-issue
the draft.  If we are to consider the draft as PS, then  I think a section
needs to be added to motivate this change (which *WILL* be confirmed on
this list after this revision).

Comments follow on the document (with no intended change to thee
protocol), please consider these - I'd love to hear from others on the
list if they also have any comments on this version!

Gorry
(tsvwg chair)

—

Section 4: (NiT in language)

/In order to address the issues described in Section 3, We propose to
   update SCTP path management scheme as follows./
to
/To address the issues described in Section 3, this section
   updates SCTP path management:/
-If this is to be considered as a PS, it /specifies/ the update.

Section 4.1: (language, there is no ordering so avoid this usage):
/In order to minimize performance impact during failover/
to
/To minimise the performance impact during failover/

Section 4.1 (structure)

- If I understand correctly, the first two paras, including the 2 bullets
are actually informative discussion to introduce the method, if this is
the case could these be moved to section 4, so thats action 4.1 includes
the RFC2119 protocol update.

Section 4.1: (NiT in language)
/tweaking/tuning/

/This proposal introduces/This new method introduces/

Requirement  1 (RFC2119)
/The recommended value of PFMR = 0  when quick failover is used.  /
- is this /RECOMMENDED/ ?

Requirement 3: (NiT in language)
/PF state with least error count/
/PF state with the lowest error count/

/In case of multiple PF/
/When there are multiple PF/

Requirement 4: (Clarity)

/This
        specification recommends that heartbeats be send to PF
        destinations independently from whether the Path Heartbeat
        function (Section 8.3 of [RFC4960]) is enabled for the
        destination address or not./
- This appears at the end of the requirement. I'm not sure what is
intended, could it be that this
RECOMMENDS the use of heartbeats with PF, in which case I'd have expected
this requirement perhaps to be stated first?

Requirement 7: (NiT in language)

/state with least error count/
/state with the lowest error count/

/In  case of multiple destinations with same error count in inactive
        state/
/When there are multiple destinations with the same error count in the
inactive state/

Requirement 4: (RFC2119)

/In order to support this
        prescription a sender SHOULD allow for increment of the
        destination error counters up to some reasonable limit above
        PMR+1/
/Therefore, a sender SHOULD allow for incrementing the
        destination error counters up to some reasonable limit above
        PMR+1/
- I can see the goal of reasonable limit, but is this actually a "limit" 
- that stops further incrementing or "permits" the value to be incremented
beyond PMR+1?

/A  sender MAY choose to deploy other strategies than the above/
- good to state this, but this is the second part of a SHOULD statement
explaining options, as such I do not think it needs an upper case /MAY/
and suspect that /may/ will be sufficient.

Requirement 8: (NiT in language)

/ACKs for chunks which have been transmitted to multiple
        destinations (i.e., a chunk which has been retransmitted to a
        different destination than the destination to which the chunk
        was first transmitted) /
/ACKs for chunks that have been transmitted to multiple
        destinations (i.e., a chunk that has been retransmitted to a
        different destination than the destination to which the chunk
        was first transmitted) /
- change /which/ to /that/

Use upper case for /ack/?

/In this respect then it is specified
        that bytes of a newly acknowledged chunk which has been
        transmitted to multiple destinations SHOULD, when the conditions
        for such contribution is fulfilled following the prescriptions
        of Section 7.2 of [RFC4960], contribute to the congestion window
        growth towards the destination to which the chunk was last
        transmitted. /
- This clause is rather complex and long, please rephrase!

/A SCTP sender MAY apply a different approach for
        both the error count handling as well as the congestion control
        growth handling does it have unequivocally information as to
        which destination (including multiple destinations) the chunk
        reached.  /

- This clause is rather complex, and hard to parse, please rephrase!

Requirement 10: (Clarity)

/SCTP SHOULD provide the means to expose the PF state /
- Is this the SCTP stack should expose to the ULP, please clarify, and use
language similar to other SCTP RFCs.

/Active state ./ - remove space before stop,

/When  doing such SCTP MUST /
/When  this, the SCTP stack MUST /

/suppress exposure of PF state to ULP.  /
/suppress exposure of PF state to the ULP.  /

- I found this requirement not so clearly explained, and rather hard to
comprehend from the text.

Section 4.2: (RFC2119, Clarity)

/Post failover then, by [RFC4960] behavior, an SCTP sender migrates
   the traffic back to the original primary destination once this
   destination becomes active anew.  /
- This doesn’t seem to make sense.

No 1

/Any setting of PSMR < PFMR
       MUST be rejected by the implementation./
- MUST be rejected seems odd. Can you phrase this the other way around?
The PSMR MUST be set so that PSMAR>=PFMR.
- This seems to contradict the previous SHOULD.

No 2 and 3 do not introduce RFC 2119 requirements. Is this intentional, if
so why?

No 4
/support set of PSMR = PFMR/
/support the setting of PSMR = PFMR/

/ We recommend that SCTP-PF sticks to the standard RFC4960 switchback
   behavior as default/
- is this intended to be:
/This specifications RECOMMENDS a default configuration that uses standard
RFC4960 switchback./

/However in order to support//However, to support/

- does this section need the MAYs in capital letters, since it follows a
SHOULD?

—

Section 6:

I’m happy with this clause, if it is correct. For clarity I’d prefer the
section is prefixed by something like:
“Secutity considerations for the use of SCTP are discussed in RFCxxx and
RFCyyyy.”

Appendices: (Structure)

Two appendices are provided, but seem not to be described in the body of
the specification. - please either add references were appropriate to the
appendices or remove them.

Proposed change of status

I would expect such a section to b marked for removal before publication
by RFC-Ed (since this is ONLY an AD/IESG decision - that provides a couple
of short paras that document the evolution of the draft from EXP to PS,
and motivate that the experimental considerations are no longer an issue.
I believe the authors have clearly put this argument in the past, and hope
that this is not a difficult section to add.

gorry | 28 Sep 21:23 2014
Picon
Picon

WGLC conclusion for draft-ietf-tsvwg-sctp-prpolicies

The above draft completed WGLC with minor comments, and resulted in a new
revision that just been published. If you commented during the WGLC,
please check that this revision resolves your issues, or send a message to
the tsvwg list.

If there are no new comments, the chairs plan to submit this to our AD
with the intended status of PS (including an INFO section describing the
socket API).

Best wishes,

Gorry
(tsvwg co-chair)

internet-drafts | 28 Sep 19:47 2014
Picon

I-D Action: draft-ietf-tsvwg-sctp-prpolicies-04.txt


A New Internet-Draft is available from the on-line Internet-Drafts directories.
 This draft is a work item of the Transport Area Working Group Working Group of the IETF.

        Title           : Additional Policies for the Partial Reliability Extension of the Stream Control Transmission Protocol
        Authors         : Michael Tuexen
                          Robin Seggelmann
                          Randall R. Stewart
                          Salvatore Loreto
	Filename        : draft-ietf-tsvwg-sctp-prpolicies-04.txt
	Pages           : 10
	Date            : 2014-09-28

Abstract:
   This document defines two additional policies for the Partial
   Reliability Extension of the Stream Control Transmission Protocol
   (PR-SCTP) allowing to limit the number of retransmissions or to
   prioritize user messages for more efficient send buffer usage.

The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-tsvwg-sctp-prpolicies/

There's also a htmlized version available at:
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-prpolicies-04

A diff from the previous version is available at:
http://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-sctp-prpolicies-04

Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

Caitlin Bestler | 11 Sep 00:24 2014

draft-bestler-transactional-subset-multicast-00


TSVWG                                                    C. Bestler, Ed.
Internet-Draft                                                  R. Novak
Intended status: Experimental                                    Nexenta
Expires: March 14, 2015                               September 10, 2014


           Creation of Transactional Subset Multicast Groups
            draft-bestler-transactional-subset-multicast-00

Abstract

   This memo presents techniques for controlling the membership of
   multicast groups which are constrained to be a subset of a pre-
   existing multicast group, where such subset groups are only used for
   short duration transactions which are multicast to a subset of the
   larger multicast group.

Editor's Note

   The proper working group for this draft has not yet been determined.
   Alternate working groups include PIM and INT.

   Nexenta has been developing a multicast based transport/storage
   protocol for Object Clusters at Nexenta.  This applies multicast
   datagrams to creation and replication of Objects such as those
   supported by the Amazon Simple Storage Service ("S3") protocol or the
   OpenStack Object Storage service ("Swift").  Creating replicas of
   object payload on multiple servers is an inherent part of any storage
   cluster, which makes multicast addressing very inviting.  There are
   issues of congestion control and reliability to settle, but new Layer
   2 capabilities such as DCB (Data Center Bridging) make this doable.

   However, we found that the existing protocols for controlling
   multicast group membership (IGMP and MLD) are not suitable for our
   storage application.  The Authors doubt this is unique to a single
   application.  It should apply to many clusters that have a need to
   distribute transactional messages to dynamically selected subsets of
   a group within a cluster to multiple known recipients.

   Computational clusters using MPI are also potential users of
   transactional multicasting.  Inter-server replication in a pNFS
   cluster is another.

   These are just examples of synchronizing cluster data where the
   synchronization does not replicate all of the shared data with the
   entire cluster.  But these are merely initial hunches, working group
   feedback is expected to refine characterization of the applicability
   of transactional subset multicast groups.



Bestler & Novak          Expires March 14, 2015                 [Page 1]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   This submission, and ensuing discussion of this draft and its
   successors will make reference to specific applications, including
   the Nexenta Replicast protocol for multicast replication in Nexenta's
   Cloud Copy-on-Write (CCOW) Object Cluster used in the NexentaEdge
   product.  Such examples are merely for illustrative purposes.  Any
   IETF standardization of the Replicast storage protocols would be done
   via the Storm or NFS groups, and would require adoption of a
   definition of Object Storage as a service before standardizing any
   specific protocol for providing Object Storage services.

   At this stage in drafting message formats have not yet been set for
   the standardized version of the protocol.  The pre-standard version
   was limited to a single L2 physical network, which would be an
   inappropriate limitation for an IETF standard.  Working Group
   feedback on the format of these messages will be sought during the
   consensus building process.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 14, 2015.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Bestler & Novak          Expires March 14, 2015                 [Page 2]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.1.  Requirements Notation . . . . . . . . . . . . . . . . . .   4
   2.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  An Example Application  . . . . . . . . . . . . . . . . . . .   5
   4.  Generalized Usage of Transactional Subset Multicast Groups  .   6
   5.  Transactional Subset Multicast Groups . . . . . . . . . . . .   6
     5.1.  Definition  . . . . . . . . . . . . . . . . . . . . . . .   6
       5.1.1.  Dynamic Specification versus Dynamic Selection  . . .   7
       5.1.2.  Push vs. Join . . . . . . . . . . . . . . . . . . . .   7
     5.2.  Applicability . . . . . . . . . . . . . . . . . . . . . .   8
       5.2.1.  How is the Group Selected?  . . . . . . . . . . . . .   8
       5.2.2.  What are the endpoints that receive the messages? . .   9
       5.2.3.  What is the duration of the group?  . . . . . . . . .   9
       5.2.4.  Who are the members of the group? . . . . . . . . . .  11
       5.2.5.  How much latency does the application tolerate? . . .  11
       5.2.6.  What must be done to maintain the Group?  . . . . . .  12
   6.  Forwarding Control Agent  . . . . . . . . . . . . . . . . . .  12
     6.1.  Network Topology  . . . . . . . . . . . . . . . . . . . .  13
     6.2.  Isolated VLANs Strategy . . . . . . . . . . . . . . . . .  13
   7.  Forwarding Control Agent Methods  . . . . . . . . . . . . . .  14
     7.1.  Dynamically Pushed Subset Groups  . . . . . . . . . . . .  14
     7.2.  Persistent Transactional Subset Groups  . . . . . . . . .  15
   8.  Relationship to Existing Multicast Membership Protocols . . .  16
   9.  Control Protocol  . . . . . . . . . . . . . . . . . . . . . .  17
   10. Forwarding Control Agent Methods  . . . . . . . . . . . . . .  17
     10.1.  Create Transactional Multicast Address Block . . . . . .  17
     10.2.  Release Transactional Multicast Address Block  . . . . .  18
     10.3.  Set Dynamic Transactional Multicast Group Membership
            IPV6 . . . . . . . . . . . . . . . . . . . . . . . . . .  18
     10.4.  Set Dynamic Transactional Multicast Group Membership
            IPV4 . . . . . . . . . . . . . . . . . . . . . . . . . .  19
     10.5.  Set Persistent Transactional Multicast Groups IPv6 . . .  19
     10.6.  Set Persistent Transactional Multicast Groups IPv4 . . .  20
     10.7.  Refresh Persistent Transactional Multicast Group . . . .  21
   11. Security Considerations . . . . . . . . . . . . . . . . . . .  22
   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  23
   13. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .  23
   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  23
     14.1.  Informative References . . . . . . . . . . . . . . . . .  23
     14.2.  Normative References . . . . . . . . . . . . . . . . . .  24
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  24








Bestler & Novak          Expires March 14, 2015                 [Page 3]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


1.  Introduction

   Existing standards for controlling the membership of multicast groups
   can be characterized as being Join-driven.  These include
   [RFC3376],[RFC3810], [RFC4541] and [RFC4604].  Due to their inherent
   latency these techniques prove to be unsuitable for maintaining large
   sets of related multiast groups.  This memo details a new method of
   maintaining such large sets of related multicast groups when they are
   all subsets of a single master reference group.  This is not a
   restriction for most cluster-oriented applications which could use
   transactional multicasting.

   Transactional Subset Multicasting defines techniques that extends
   existing control of a reference multicast group to a potentially
   large set of multicast addresses used with a VLAN within each local
   subnet that the reference multicast group reaches.

   This specification makes no modifications to the forwarding of
   multicast packets nor to the communications between mrouters.  New
   methods are defined to set Layer 2 multicast forwarding rules on
   switches within each of the relevant Layer 2 subnets.

1.1.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.  Motivation

   Transactional Subset Multicast groups are maintained within each
   VLAN.  A 'Forwarding Control Agent' is defined within each VLAN that
   is responsible for applying the forwarding information known for a
   reference multicast group to efficiently set layer 2 multicast
   forwarding rules within each local network.

   The functionality of the Forwarding Control Agent is best understood
   as extending the functionality of IGMP/MLD Snooping (See [RFC4541]).

   An IGMP/MLD snooper interprets IGMP (see [RFC3376]) or MLD (see
   [RFC3810]) messages to translate their Layer 3 objectives into Layer
   2 multicast forwarding rules.

   A Forwarding Control Agent interprets new messages defined in this
   specification for a newly defined class of transactional subset
   multicast groups into the same Layer 2 multicast forwarding rules.
   Strategies for implementing Forwarding Control Agents would include




Bestler & Novak          Expires March 14, 2015                 [Page 4]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   extending IGMP/MLD snooping implementations or building the
   Forwarding Control Agent external to the existing L2 switch software.

   The per transaction costs of using such groups are far lower than
   with the existing methods.  The ongoing maintenance work for
   multicast forwarding elements is limited to the reference multicast
   group, it is not replicated for each of the subset transactional
   multicast groups.

3.  An Example Application

   The Replicast (see [Replicast]) usage of transactional subset
   multicasting involves:

   o  Taking a Cryptographic Hash of each chunk to be stored.  This
      "hash id" is used with a distributed hash table to determine a
      conventional multicast group which will be used to negotiate
      placement of the chunk.  This is the reference multicast group.
      Replicast refers to it as a "Negotiating Group".

   o  Multicasting a request to put the chunk to the reference multicast
      group.  Receiving storage nodes will respond with a bid on when
      they could store that chunk, or an indication that they already
      have that chunk stored.  Each of the storage nodes is offering a
      provisional reservation of its input capacity for a specific time
      window.

   o  Assuming that the chunk is not already stored, selecting the best
      responses to make a transactional subset group.  Determination of
      'best' typically is driven by the earliest possible completion of
      the transaction, but may factor the current available storage
      capacity on each of the storage nodes as well.

   o  Form or select a "rendezvous group" which will be used to transfer
      the chunk.  When the core network is non-blocking, the transfer
      will be able to proceed at close to full wire speed at the
      reserved time because each of the selected storage nodes has
      reserved its input capacity for bulk payload exclusively.  A
      multicast message to the reference group informs both those
      selected and those not selected for the rendezvous transfer.
      Those not selected will release the provisional reservation.

   o  At the designated time, multicast the chunk payload to the
      transactional subset multicast group.

   o  Each recipient validates the cryptographic hash of the received
      data, and unicasts a positive or negative acknowledgement to the
      sender.



Bestler & Novak          Expires March 14, 2015                 [Page 5]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   o  If sufficient valid copies have been positively acknowledge, the
      transaction is complete.  Otherwise it is retried.

4.  Generalized Usage of Transactional Subset Multicast Groups

   Beyond a specific application, the generalized potential for dramatic
   savings is that transactional messaging within a cluster is a
   radically different use-case from traditional multicast.  The set of
   factors that differentiates this class of applications can be
   examined through a series of questions:

   o  How is the group Selected?  Section 5.2.1

   o  What are the endpoints that receive the messages?  Section 5.2.2

   o  What is the duration of the group?Section 5.2.3

   o  Who are the potential members of the group?  Section 5.2.4

   o  How much latency does the application tolerate?  Section 5.2.5

   o  What must be done to maintain the group?  Section 5.2.6

5.  Transactional Subset Multicast Groups

5.1.  Definition

   A Transactions Subset Multicast Group is a multicast group which:

   o  Is derived from a pre-existing multicast group created by means
      independent of this standard.  The membership of this derived
      group is a subset of the reference existing multicast group.

   o  Has a multicast group address which is part of a block allocated
      for transactional multicast groups.

   o  Will only be used for the duration of a transaction.  A network
      failure or re-configuration during the transaction will require an
      upper layer retry of the transaction.  Transactional Subset
      Multicast groups are not suitable for streaming of content.
      Transactional subset multicast groups may be persistent, in that
      the same group continues to exist and be used for a series of
      transactions.  But each message sent to the group is part of a
      single short duration transaction.







Bestler & Novak          Expires March 14, 2015                 [Page 6]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


5.1.1.  Dynamic Specification versus Dynamic Selection

   There are two basic strategies for managing the membership of subset
   multicast groups:

   o  Dynamic Specification: The selected members join a group that had
      been pre-selected for the transaction.

   o  Dynamic Selection: A pre-existing group is selected to match the
      subset desired.  That group is allocated for this purpose and used
      for the transaction.

   These two strategies can also be combined to form a hybrid strategy.
   If there is a pre-existing group for the desired membership list it
   is allocated and used, otherwise an available group is allocated and
   re-configured to have the required membership.

5.1.2.  Push vs. Join

   Existing methods for managing membership of a multicast group can be
   characterized as Join protocols.  The receivers may join the group,
   or subscribe to a specific source within a group, but the receivers
   of multicast messages control their reception of multicast messages.

   This model is well suited for multimedia transmission where the
   sender does not necessarily know the full set of endpoints receiving
   its multicast content.  In many cluster application the sender has
   determined the set of receivers.  Requiring the sender to communicate
   with the recipients so that they can Join the group adds latency to
   the entire transaction.

   However, there would be a serious security concern if transactional
   multicasting is not limited to transactional subset multicasting.
   Requiring that every member of a subset multicast group already be a
   member of a reference multicast group ensures that no new method of
   sending traffic is being created.  Without this guarantee a denial-
   of-service attacker could simply push a multicast group membership
   listing 1000 members, then flood that multicast group.  The amount of
   traffic delivered to the aggregate destinations would be multiplied
   by a factor of 1000.

   Transactional subset multicasting is defined to eliminate the latency
   required for Join-directed multicast group membership, while avoiding
   creating a new attack vector for denial-of-service flooding.







Bestler & Novak          Expires March 14, 2015                 [Page 7]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


5.2.  Applicability

   Transactional Subset Multicast Groups are applicable for applications
   that want to reduce overall latency by reducing the number of round-
   trips required for their transactions when identical content must be
   delivered to multiple cluster members, but the selected members are a
   subset of a larger group that must be dynamically selected.

   Parallel processing of payload and/or storage of payload are the
   primary examples of such a pattern of communications.

   Examples of such applications include:

   o  Computational Clusters, particularly those using MPI (see [MPI])

   o  Storage applications, including:

      *  pNFS (See [RFC5661]).

      *  Amazon Simple Storage Service (S3) (See [AmazonS3]).

      *  OpenStack Object Storage (Swift) (See [Swift]).

   Dynamic selection of subsets ultimately enables multiple concurrent
   transfers to occur, which would not have been possible if the message
   had been sent to the entire reference multicast group.  Applications
   with relatively small payload to be multicast may find it easier to
   use simple multicast and slightly over-deliver the message.

5.2.1.  How is the Group Selected?

   In Join-directed multicasting the membership of a multicast group is
   controlled by the listeners joining and leaving the group.  The
   sender does not control or even know the recipients.  This matches
   the multicast streaming use-case very well.  However it does not
   match a cluster that needs to distribute a transactional message to a
   subset of a known cluster.

   The target group is also assumed to be stable for a long sequence of
   packets, such as streaming a video.  The targeted applications direct
   transactions to a subset of a stable group.

   One example of the need to distribute a transactional message to a
   subset of a known cluster is replication of data within an object
   cluster.  A set of targets has been selected through an higher layer
   protocol.  Joi-directed group setup here adds excessive latency to
   the process.  The targets must be informed of their selection, they
   must execute IGMP joins and confirm their joining to the source



Bestler & Novak          Expires March 14, 2015                 [Page 8]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   before the multicast delivery can begin.  Only replication of large
   storage assets can tolerate this setup penalty.

   A distributed computation may similarly have data that is relevant to
   a specific set of recipients within the cluster.  Performing the
   distribution serially to each target over unicast point-to-point
   connections uses excessive bandwidth and increases the transactions'
   latency.  It is also undesirable to incur the latency of Join-driven
   multicast group setup.

   This specification creates two methods for a sender to form or select
   a multicast group for transactional purposes.  With these methods no
   further transmissions are required from the selected targets until
   the full transfer is complete.

   The restriction that the targeted group must be a subset of an
   existing multicast group is necessary to prevent a denial-of-service
   flooding attack.  Transactional multicast groups that were not
   restricted to being a subset of an existing multicast group could be
   used to flood a large number of targets that were unprepared to
   process incoming multicast datagrams.

5.2.2.  What are the endpoints that receive the messages?

   The endpoints of the transactional messages may be higher layer
   entities, where each network endpoint supports multiples instances of
   the higher layer entities.  For example, a storage application may
   have IP addresses associated with specific virtual drives, as opposed
   to an IP address associated with a server that hosted multiple
   virtual drives.

   Having an IP address for each drive makes migrating control over that
   drive to a new server easier, and allows the servers to direct
   incoming payload to the correct drive.

5.2.3.  What is the duration of the group?

   Join-directed multicasting is designed primarily for the multicast
   streaming use-case.  A group has an indefinite lifespan, and members
   come and go at any time during this lifespan, which might be measured
   in minutes, hours or days.

   Transaction multicasting is designed to support applications where a
   transaction lasts for microseconds or milliseconds (possibly even
   seconds).  Transactional multicasting seeks to identify a multicast
   group for the duration of sending a set of multicast datagrams
   related to a specific transaction.  Recipients either receive the
   entire set of datagrams or they do not.  Multicast streaming



Bestler & Novak          Expires March 14, 2015                 [Page 9]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   typically is transmitting error tolerant content, such as MPEG
   encoded material.  Transaction multicasting will typically transmit
   data with some form of validating signature and transaction
   identifier that allows each recipient to confirm full reception of
   the transaction.

   This obviously needs to be combined with applicable congestion
   control strategies being deployed by the upper layer protocols.  The
   Nexenta Replicast protocol only does bulk transfers against reserved
   bandwidth, but there are probably as many solutions for this problem
   as there are applications.  Replicast relies upon IEEE I802.1
   Datacenter Bridging (DCB) protocols such as Priority Flow Control and
   Congestion Notification to provide no-drop service.  The DCB
   protocols deal with the fine timing of congestion avoidance, but
   require higher layer transport or application protocols to keep the
   sustained traffic rates below the sustained capacity.  Creating
   explicit reservations for bulk transfers is the main method for
   accomplishing this.

   The relevant DCB protocols include:

   o  Congestion Notification:[IEEE.802.1Qau-2011]

   o  Enhanced Transmission Selection:[IEEE.802.1Qaz-2011]

   o  Priority Flow Control[IEEE.802.1Qbb-2011]

   The important distinction between Replicast and conventional
   multicast applications is that there is no need to dynamically adjust
   multicast forwarding tables during the lifespan of a transaction,
   while IGMP and MLD are designed to allow the addition and deletion of
   members while a multicast group is in use.  This distinction is not
   unique to any single storage application.  Transactional replication
   is a common element in cluster protocol design.

   The limited duration of a transactional multicast group implies that
   there is no need for the multicast forwarding element to rebuild its
   forwarding tables after it restarts.  Any transaction in progress
   will have failed, and been retried by the higher-layer protocol.
   Merely limiting the rate at which it fails and restarts is all that
   is required of each forwarding element.

   Another implication is that there is no need for the forwarding
   elements to rebuild the membership list of a transactional multicast
   group after the forwarding element has been reset.  The transactions
   using the forwarding element will all fail, and be retried by a
   higher layer transport or application protocol.  Assuming that




Bestler & Novak          Expires March 14, 2015                [Page 10]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   forwarding elements do not reset multiple times a minute this will
   have very limited impact on overall application throughput.

   The duration of a transaction is application specific, but inherently
   limited.  A failed transaction will be retried at the application
   layer, so obviously it has a duration measured in seconds at the
   longest.

5.2.4.  Who are the members of the group?

   Join-directed multicasting allows any number of recipients to join or
   leave a group at will.

   Transactional multicast requires that the group be identified as a
   small subset of a pre-existing multicast group.

   Building forwarding rules that are a subset of forwarding rules for
   an existing multicast group can be done substantially faster than
   creating forwarding rules to arbitrary and potentially previously
   unknown destinations.

   Some applications, including Object Clusters, benefit considering the
   members to be higher layer entities (such as virtual drives) rather
   than simply being the base IP address of the servers that host the
   higher layer entities.  Doing so allows groups to be defined for each
   set of logical endpoints, not merely sets of physical endpoints.  An
   Object Cluster, for example, could have two different groups ([A,B,C]
   vs [A,B,D]) even when the destinations are the same Layer 2 MAC
   address (i.e., C and D are hosted by the same server).  This allows
   the server hosting both C and D to distinguish which entity is
   addressed using the Destination IP Address.

5.2.5.  How much latency does the application tolerate?

   While no application likes latency, multicast streaming is very
   tolerant of setup latency.  If the end application is viewing or
   listening to media, how many msecs are required to subscribe to the
   group will not have a measurable impact to the end user.

   For transactions in a cluster, however, every msec is delaying
   forward progress.  The time it takes to do an IGMP join would be a
   significant addition to the latency of storing an object in an object
   cluster using a relatively fast storage technology (such as SSD,
   Flash or Memristor).







Bestler & Novak          Expires March 14, 2015                [Page 11]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


5.2.6.  What must be done to maintain the Group?

   The Join-directed multicast protocols specify methods for the
   required maintenance of multicast groups.mMulticast forwarders,
   switches or mrouters, must deal with new routes and new locations for
   endpoints.

   The reference multicast group will still be maintained by the
   existing Join-directed multicast group protocols.  The existing IGMP/
   MLD snooping procedures will keep the L2 multicasting forwarding
   rules updated as changes in the network topology are detected.
   Nothing in this specification changes the handling of the reference
   multicast group.

   Transactional subset multicast groups are defined to be used only for
   short transactions, allowing them to piggy-back on the maintenance of
   the reference multicast group.

6.  Forwarding Control Agent

   The Forwarding Control Agent is responsible for translating
   forwarding control messages as defined in Section 7 into Layer 2
   multicast forwarding for one or more subnets associated with a single
   physical layer 2 subnet.

   Each Forwarding Control Agent can be though of as extending the IGMP/
   MLD snooping capabilities of an L2 forwarding element.  It is
   translating the forwarding control agent messages into configuration
   of L2 multicast forwarding just as an IGMP/MLD snooper translates
   IGMP/MLD messages into configuration of Layer 2 multicast forwarding.
   This MAY be done external to the existing implementation, or it may
   be integrated with the IGMP/MLD snooper implementation.

   Each Forwarding Control Agent:

   o  MUST Accept authenticated forwarding control agent messages
      controlling the creation and membership of Transactional Subset
      Multicast Groups within the context of a specified VLAN.

   o  MUST support at least one VLAN.

   o  MAY support multiple VLANs.

   o  MUST update the controlled Layer 2 forwarding element's multicast
      forwarding rules to reflect the subset specified for the group.

   o  MUST Update the controlled L2 forwarding elements multicast
      forwarding rules to reflect changes in the mapping of IP addresses



Bestler & Novak          Expires March 14, 2015                [Page 12]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


      to L2 MAC addresses between transactions for persistent
      transactional suset multicast groups when informed of a prior
      transactional failure with a Refresh Membership message (see
      Figure 7).

   o  MAY refresh the Layer 2 multicast forwarding rules at any time.

6.1.  Network Topology

   Forwarding Control Agents are applicable for networks which consist
   of one or more local subnets which have direct links with each other.

6.2.  Isolated VLANs Strategy

   Transactional Subset Multicast groups define a very large number of
   multicast addresses which must be delivered within a closed set of IP
   subnets without having to dynamically co-ordinate allocation of these
   multicast addresses with a wider network.

   This MAY be accomplished using a "Isolated VLANs Strategy" where the
   reference multicast group and all transactional multicast groups
   derived from it are used strictly inside of a single VLAN or a set of
   interconnected VLANs which route these multicast groups solely within
   this closed set.

   Specifically, an implementation using the Isolated VLANs Strategy:

   o  MUST include only a pre-defined set of subnets,each enforced with
      a VLAN.

   o  MUST provide for routing or forwarding of all packets using the
      reference multicast group and all transactional subset multicast
      groups derived from it amongst these subnets.

   o  MUST NOT allow any packet using the reference multicast group or
      any transactional subset multicast groups derived from it to be
      routed to any subnet that is not part of the identified Isolated
      VLAN set.

   o  MAY/SHOULD guard the confidentiality of multicast packets routed
      between subnets that transit subnets that are not part of the
      Isolated VLAN set.

   Applications MAY use the Isolated VLAN Strategy.  Virtually all
   applications will elect to do so because allocating a very large
   block of adjacent multicast addresses would be very difficult.
   Confining usage of these addresses to a single VLAN is highly
   desirable.



Bestler & Novak          Expires March 14, 2015                [Page 13]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   Direct connections between the VLANs hosting Forwarding Control
   Agents is required because the Transactional Subset Multicast Groups
   are not known to any intermediate multicast routers that would
   implement indirect links.  Co-locating Forwarding Control Agents with
   RBridges [[RFC6325]] MAY be a solution.

7.  Forwarding Control Agent Methods

7.1.  Dynamically Pushed Subset Groups

   Each Pushed Subset Membership commands MUST contain the following:

   o  Subset Transactional Multicast Group: Group multicast address that
      is to have its multicast forwarding rules updated.  This address
      must be within a block of Transactional Multicast Groups
      previously created using the Create Transactional Multicast
      Address Block command (Section 10.1).

   o  Target List: List of IP Addresses which are to be the targets of
      this group.  These addresses are intended to be members of the
      reference group.  When formulating the list, non-members MUST NOT
      be included.  However there is no transaction lock placed upon the
      group, and therefor there may be changes in the group membership
      before the message is received.  Therefore the Forwarding Control
      Agent MUST ignored any listed target that is not a member of the
      reference group.

   This sets the multicast forwarding rules for pre-existing multicast
   forwarding address X to be the subset of the forwarding rules for
   existing group Y required to reach a specified member list.

   This is done by communicating the same instruction (above) to each
   multicast forwarding network element.  This can be done by unicast
   addressing with each of them, or by multicasting the instructions.

   Each multicast forwarder will modify its multicast forwarding port
   set to be the union of the unicast forwarding it has for the listed
   members, but result must be a subset of the forwarding ports for the
   parent group.

   For example, consider an instruction is to modify a transaction
   multicast group I which is a subset of multicast group J to reach
   addresses A,B and C.

   Addresses A and B are attached directly to multicast forwarder X,
   while C is attached to multicast forwarder Y.

   On forwarder X the forwarding rule for new group I contains:



Bestler & Novak          Expires March 14, 2015                [Page 14]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   o  The forwarding port for A.

   o  The forwarding port for B.  The forwarding port to forwarder Y (a
      hub link).  This eventually leads to C.

   While on forwarder Y the forwarding rule for the new group I will
   contain:

      The forwarding port for forwarder X (a hub link).  This eventually
      leads to A and B.

      The forwarding port for C.

   This assumes that the Forwarding Control Agent can perform a two-step
   translation: first from IP Address to MAC Address, and then from MAC
   Address to forwarding port.  For typical applications of
   Transactional Subset Multicasting, all of the referenced IP Addresses
   will have been involved in recent messaging, and therefore will
   frequently already be cached.

   Many ethernet switches already support command line and/or SNMP
   methods of setting these multicast forwarding rules, but it is
   challenging for an application to reliably apply the same changes
   using multiple vendor specific methods.  Having a standardized method
   of pushing the membership of a multicast group from the sender would
   be desirable.

   A Forwarding Control Agent MAY accept a request where the Target List
   is expressed as a list of destination L2 MAC addresses.

7.2.  Persistent Transactional Subset Groups

   There is a large group of pre-configured multicast groups which are
   an enumeration of the possible subsets of a master group.  This will
   be a specific subset, such as all combinations of 3 members for
   multicast group X.  These groups are enumerated and assuaged
   successive multicast addresses within a block.

   The sender first obtains exclusive permission to utilize a portion of
   the reception capacity of each desire target, and then selects the
   multicast address that will reach that group.

   In a straightforward enumeration of 3 members out of a group of 20,
   there are 20*19*18/3*2 or 1040 possible groups.  Typically the higher
   layer protocol will have negotiated the right to send the transaction
   with the member prior to selecting the multicast group.  In making
   the final selection, the actual multicast group is selected and some
   offered targets are declined.



Bestler & Novak          Expires March 14, 2015                [Page 15]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   Those 1040 possible groups can be enumerated in order (starting with
   M1, M2 and M3 and ending with M18, M19 and M20) and assigned
   multicast addresses from N to N+1039.

   When the transaction requires reaching M4, M5 and M19, you simply
   select that group.  Because exclusive rights to use multicasting to
   M4, M5 and M19 have already been obtained through the higher layer
   protocol the group [M4,M5,M19] is already exclusively claimed.

   These 1040 groups may be set up through any of the following means:

   o  Traditional IGMP/MLD joining/leaving.

   o  Setting static forwarding rules using SNMP MIBs and/or switch-
      specific command line interfaces.  Note that the wide-spread
      existence of command line interfaces to custom set multicast
      forwarding rules is an indicator that there are existing
      applications that find the exising IGMP/MLD protocols to be
      inadequate to fulfill their needs.

   o  The Dynamically Pushed Multicast Group method.  See Section 7.1

8.  Relationship to Existing Multicast Membership Protocols

   TBD: briefly describe and cite IGMP, MLD and PIM.

   Transactional Subset Multicast Groups are not a replacement for Join-
   based management of Multicast Groups.  Rather it extends the group
   maintenance performed by the Join-based multicast control protocols
   from the reference group to any entire set of multicast addresses
   that are subsets of it.

   This extension requires no modification to the existing data-plane
   multicast forwarding protocols or implementations.  Transactional
   Subset Multicast groups may be implemented solely in the sender,
   receivers and the Forwarding Control Agents associated with each
   multicast forwarder supporting the reference group.

   The maintenance work of the Join-based multicast protocols performed
   on the reference multicast group is leveraged to allow maintenance of
   a potentially large number of derived Transactional Multicast groups.
   This allows identification of a large number of subsets of the
   reference group, without requiring a matching increase in the
   maintenance traffic which would have been required had the derived
   groups been formed with a Join-based protocol.






Bestler & Novak          Expires March 14, 2015                [Page 16]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


9.  Control Protocol

   Note: the pre-standard protocol relies on multicasting of commands
   within a single secure VLAN.  More general usage of these techniques
   will require transmitting Forwarding Control Agent instructions
   between subnets where they may be subject to interception and even
   alteration.  Therefore a more secure method of delivering Forwarding
   Control Agent instructions is required.

   The methods standardized by the KARP (Key Authentication for Router
   Protocols) are, in the Authors' opinion, fully applicable to this
   protocol.  See [RFC6518].  Working Group feedback is sought as to how
   to expand this section, whether to split the Control Protocol to a
   separate document, or other methods of dealing with the control
   protocol.

   The following requirements apply to any Control Protocol used:

   o  Each request MUST be uniquely identified.  This identification
      MUST include the source IP address of the requester.

   o  The message MUST be authenticated.

   o  WG discussion is needed to reach a consensus as to whether the
      message contents need to be kept confidential, or whether
      preventing alteration is sufficient.

   o  The sender MUST NOT be required to transmit the command more than
      once other than as required for retries.  For example, requiring
      SSH connections with each Forwarding Control Agent is not
      acceptable.

   o  Barring network errors, the message MUST be delivered to all
      Forwarding Control Agents that can receive the reference master
      group.

10.  Forwarding Control Agent Methods

10.1.  Create Transactional Multicast Address Block

   TBD:This section will define the fields required for the command to
   create a block of transactional subset multicast addresses within a
   specific VLAN.  The command defined here is delivered within a
   control protocol.







Bestler & Novak          Expires March 14, 2015                [Page 17]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                Opcode=CreateTransactionalMulticast            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    Base Multicast Group Number                |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
      |               Number of Addresses required in Block           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+_-+--+-+

       Figure 1: Create Transcaction Multicast Address Block Message

   The Multicast Group Number is the 24-bit L2 Multicast MAC address.
   This matches both the IPV4 and IPV6 addresses which map to it.  A
   given UDP datagram is sent using either an IPV4 or an IPV6 address,
   so the membership of a Multicast Group is either IPV4 endpoints or
   IPV6 endpoints at any given instant.

   This command does not allow creating numerically scattered group of
   addresses.  Doing so would have made the job of each Forwarding
   Control Agent more complex, and would be of no benefit in the
   recommended Isolated VLANs strategy (See Section 6.2).

   note: add IANA language here

10.2.  Release Transactional Multicast Address Block

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |              Opcode=ReleaseTransactionalMulticast             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    Base Multicast Group Number                |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+


       Figure 2: Release Transcactin Multicast Address Block Message

   note: add IANA language here

10.3.  Set Dynamic Transactional Multicast Group Membership IPV6










Bestler & Novak          Expires March 14, 2015                [Page 18]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Opcode=PushTransactionalMulticastMembershipIPV6       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | # members     |        Multicast Group Number                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPV6 Address of 1st Member                 |
      |                                                               |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       ...

      Figure 3: Set Dynamic Transactional Multicast Group Membership
                                  Message

   Members: 8 bit unsigned number of IPV6 addresses that are to be the
   target of this specified Multicast Group Number.

   note: add IANA language here

10.4.  Set Dynamic Transactional Multicast Group Membership IPV4

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Opcode=PushTransactionalMulticastMembershipIPV4       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | # members     |        Multicast Group Number                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     IPV4 Address of 1st member                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       ...

      Figure 4: Set Dynamic Transactional Multicast Group Membership
                                  Message

   Members: 8 bit unsigned number of IPV6 addresses that are to be the
   target of this specified Multicast Group Number.

   note: add IANA language here

10.5.  Set Persistent Transactional Multicast Groups IPv6







Bestler & Novak          Expires March 14, 2015                [Page 19]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Opcode=PushPersistentMulticastMembershipIPV6       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | select N      |      Base Multicast Group Number to be        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | # members     |        Reference Multicast Group Num          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    IPV6 Address of 1st Member                 |
      |                                                               |
      |                                                               |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       ...

   Figure 5: Set Persistent Transactional Multicast Groups Message IPV6

   Members: 8 bit unsigned number of Members that are to be included in
   each Transactional Subset Group set by this command.

   Base Multicast Group Number to be set.

   # Members in the following list of IPV6 addresses.  These must all be
   members of the Reference Multicast Group.

   Reference Multicast Group Num: 24 bit L2 Multicast Group Number.

   The motivation for supplying the list of IP addresses is to avoid
   race conditions where an IGMP or MLD join is in progress.  If there
   were a method to refer to a specific generation of a multicast group
   membership then it would be possible to omit this list.  Working
   Group suggestions are encouraged on this topic.

   note: add IANA language here

10.6.  Set Persistent Transactional Multicast Groups IPv4














Bestler & Novak          Expires March 14, 2015                [Page 20]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Opcode=PushPersistentMulticastMembershipIPV6       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | select N      |      Base Multicast Group Number to be        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | # members     |        Reference Multicast Group Num          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    IPV4 Address of 1st Member                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       ...

   Figure 6: Set Persistent Transactional Multicast Groups Message IPv4

   Members: 8 bit unsigned number of Members that are to be included in
   each Transactional Subset Group set by this command.

   Base Multicast Group Number to be set.

   # Members in the following list of IPV6 addresses.  These must all be
   members of the Reference Multicast Group.

   Reference Multicast Group Num: 24 bit L2 Multicast Group Number.

   note: add IANA language here

10.7.  Refresh Persistent Transactional Multicast Group

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                  Opcode=RefreshMulticastMembership            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | reserve       |    Multicast Group Number to be Refreshed     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | reserved      |        Reference Multicast Group Num          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 7: Refresh Persistent Transactional Multicast Groups Message

   The existing Join-directed multicast group control protocols maintain
   delivery of a multicast group to the subscribers independent of
   network topology changes either at Layer 2 or layer 3.  If a unicast
   IP datagram to a member would be delivered, then the multicast
   forwarding can be expected to also be current.





Bestler & Novak          Expires March 14, 2015                [Page 21]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   Transactional subset multicast groups do not require the same effort
   for maintenance.  For a given transaction the entire set of datagrams
   is either delivered or it is not.  There is no benefit to the
   application that the Forwarding Control Agent can achieve by promptly
   updating the L2 multicast forwarding tables after a network topology
   change.  The current transaction will miss at least one datagram, and
   therefore does not care if it misses multiple datagrams.

   However, a Persistent Transactional Subset Mutlicast Group is used
   for a sequence of transactions targeting the same group.  The upper
   layer protocol sender must have obtained exclusive rights to use the
   group for the period of time that it will be sending the transaction.

   One method that it MAY use is to obtain the exclusive right to send
   the specific type of transaction to each of the members of the
   targeted group during negotiations conducted prior to use of the
   transactional group.  For example, a reservation on inbound bandwidth
   may have been granted.

   The Forwarding Control Agent MAY refresh its mapping from member IP
   addresses to L2 MAC address and then to L2 forwarding port at any
   time.  However it MUST do so after receipt of a Refresh Transactional
   Subset Multicast Group for the group.

   The sender of a transaction SHOULD send a Refresh Transactional
   Subset Multicast Group message after it fails to receive
   acknowledgement of an attempted transaction.

11.  Security Considerations

   The methods described here enable no sender to multicast messages to
   any destination that was not already addressable by it.  Therefore no
   new security vulnerabilities are enabled by these techniques.

   Because authentication of subset commands is kept lightweight there
   is an implicit trust within the application that transactional subset
   groups will be formed or selected in accordance with application
   layer expectations.  The transport layer lacks sufficient information
   to enforce application layer expectations.  If a malicious actor
   deliberately creates a transactional subset multicast group with an
   incorrect group it may adversely impact the operation of the specific
   upper layer application.  However in no case can it be used to launch
   a denial of service attack on targets that have not already
   voluntarily joined the reference group

   The protocol does not currently provide any mechanism to guard
   against selecting an existing but unrelated multicast group as a
   reference multicast group.  Explicitly enabling use of an existing



Bestler & Novak          Expires March 14, 2015                [Page 22]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   multicast group to be a reference group would not solve the problem
   that the existing management of multicast groups is not aware of the
   need to explicitly forbid creation of derived multicast groups based
   upon a multicast group that it creates.

12.  IANA Considerations

   To be completed.

13.  Summary

   The proposal provides for two new methods to manage multicast group
   membership, Thee are simple techniques, but provide a cohesive
   cluster-wide approach to providing transactional multicasting.  These
   techniques are better suited for transactional multicasting that the
   existing methods, IGMP and MLD, which are oriented to streaming use-
   cases.

14.  References

14.1.  Informative References

   [Replicast]
              Bestler, C., "White Paper: Nexenta Replicast
              http://info.nexenta.com/rs/nexenta/images/
              Nexenta_Replicast_White_Paper.pdf", November 2013.

   [MPI]      MPI Forum, "Message Passing Inteface", 2012.

   [AmazonS3]
              Amazon, "Amazon Simple Storage Service (S3)
              http://aws.amazon.com/s3/", 2014.

   [Swift]    Openstack, "OpenStack Object Service (Swift)
              http://docs.openstack.org/developer/swift/", 2014.

   [IEEE.802.1Qau-2011]
              IEEE, "IEEE Standard for Local and Metropolitan Area
              Networks: Virtual Bridged Local Area Networks - Amendment
              10: Congestion Notification", IEEE Std 802.1Qau, 2011.

   [IEEE.802.1Qaz-2011]
              IEEE, "IEEE Standard for Local and Metropolitan Area
              Networks: Virtual Bridged Local Area Networks - Amendment
              18: Enhanced Transmission Selection.", IEEE Std 802.1Qaz,
              2011.





Bestler & Novak          Expires March 14, 2015                [Page 23]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   [IEEE.802.1Qbb-2011]
              IEEE, "IEEE Standard for Local and Metropolitan Area
              Networks: Virtual Bridged Local Area Networks - Amendment
              17: Priority-based Flow Control.", IEEE Std 802.1Qbb,
              2011.

   [RFC5661]  Shepler, S., Eisler, M., and D. Noveck, "Network File
              System (NFS) Version 4 Minor Version 1 Protocol", RFC
              5661, January 2010.

14.2.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3376]  Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A.
              Thyagarajan, "Internet Group Management Protocol, Version
              3", RFC 3376, October 2002.

   [RFC3810]  Vida, R. and L. Costa, "Multicast Listener Discovery
              Version 2 (MLDv2) for IPv6", RFC 3810, June 2004.

   [RFC4541]  Christensen, M., Kimball, K., and F. Solensky,
              "Considerations for Internet Group Management Protocol
              (IGMP) and Multicast Listener Discovery (MLD) Snooping
              Switches", RFC 4541, May 2006.

   [RFC4604]  Holbrook, H., Cain, B., and B. Haberman, "Using Internet
              Group Management Protocol Version 3 (IGMPv3) and Multicast
              Listener Discovery Protocol Version 2 (MLDv2) for Source-
              Specific Multicast", RFC 4604, August 2006.

   [RFC6325]  Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
              Ghanwani, "Routing Bridges (RBridges): Base Protocol
              Specification", RFC 6325, July 2011.

   [RFC6518]  Lebovitz, G. and M. Bhatia, "Keying and Authentication for
              Routing Protocols (KARP) Design Guidelines", RFC 6518,
              February 2012.

Authors' Addresses










Bestler & Novak          Expires March 14, 2015                [Page 24]
Internet-Draft    Transactional Subset Multicast Groups   September 2014


   Caitlin Bestler (editor)
   Nexenta Systems
   455 El Camino Real
   Santa Clara, CA
   US

   Email: caitlin.bestler <at> nexenta.com,cait <at> asomi.com


   Robert Novak
   Nexenta Systems
   455 El Camino Real
   Santa Clara, CA
   US

   Email: robert.novak <at> nexenta.com



































Bestler & Novak          Expires March 14, 2015                [Page 25]


Caitlin Bestler | 11 Sep 00:23 2014

Fwd: New Version Notification for draft-bestler-transactional-subset-multicast-00.txt

The initial feedback from my pre-draft posting were very useful. One of the hardest things in writing these protocols is realizing
what you haven’t stated because it was just too obvious to you. The first round of feedback was very helpful.


Begin forwarded message:

Subject: New Version Notification for draft-bestler-transactional-subset-multicast-00.txt
Date: September 10, 2014 at 3:17:36 PM PDT
To: Caitlin Bestler <caitlin.bestler <at> nexenta.com>, "Robert Novak" <robert.novak <at> nexenta.com>, Robert Novak <robert.novak <at> nexenta.com>, "Caitlin Bestler" <caitlin.bestler <at> nexenta.com>


A new version of I-D, draft-bestler-transactional-subset-multicast-00.txt
has been successfully submitted by Caitlin Bestler and posted to the
IETF repository.

Name: draft-bestler-transactional-subset-multicast
Revision: 00
Title: Creation of Transactional Subset Multicast Groups
Document date: 2014-09-10
Group: Individual Submission
Pages: 25
URL:            http://www.ietf.org/internet-drafts/draft-bestler-transactional-subset-multicast-00.txt
Status:         https://datatracker.ietf.org/doc/draft-bestler-transactional-subset-multicast/
Htmlized:       http://tools.ietf.org/html/draft-bestler-transactional-subset-multicast-00


Abstract:
  This memo presents techniques for controlling the membership of
  multicast groups which are constrained to be a subset of a pre-
  existing multicast group, where such subset groups are only used for
  short duration transactions which are multicast to a subset of the
  larger multicast group.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat


Tirumaleswar Reddy (tireddy | 10 Sep 19:07 2014
Picon

FW: New Version Notification for draft-wing-tsvwg-turn-flowdata-01.txt

The draft is updated to include upstream and downstream bandwidth parameters. Comments and suggestions
are welcome.

-Tiru

-----Original Message-----
From: internet-drafts <at> ietf.org [mailto:internet-drafts <at> ietf.org] 
Sent: Wednesday, September 10, 2014 10:31 PM
To: Ram Mohan R (rmohanr); Brandon Williams; Tirumaleswar Reddy (tireddy); Dan Wing (dwing);
Tirumaleswar Reddy (tireddy); Dan Wing (dwing); Ram Mohan R (rmohanr); Brandon Williams
Subject: New Version Notification for draft-wing-tsvwg-turn-flowdata-01.txt


A new version of I-D, draft-wing-tsvwg-turn-flowdata-01.txt
has been successfully submitted by Tirumaleswar Reddy and posted to the
IETF repository.

Name:		draft-wing-tsvwg-turn-flowdata
Revision:	01
Title:		TURN extension to convey flow characteristics
Document date:	2014-09-10
Group:		Individual Submission
Pages:		13
URL:            http://www.ietf.org/internet-drafts/draft-wing-tsvwg-turn-flowdata-01.txt

Status:         https://datatracker.ietf.org/doc/draft-wing-tsvwg-turn-flowdata/

Htmlized:       http://tools.ietf.org/html/draft-wing-tsvwg-turn-flowdata-01

Diff:           http://www.ietf.org/rfcdiff?url2=draft-wing-tsvwg-turn-flowdata-01


Abstract:
   TURN server and the network in which it is hosted due to load could
   adversely impact the traffic relayed through it.  During such high
   load event, it is desirable to shed some traffic but TURN server lack
   requirements about the flows to prioritize them.  This document
   defines such a mechanism to communicate flow characteristics from the
   TURN client to its TURN server.

                                                                                  


Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

Caitlin Bestler | 28 Aug 19:02 2014

Re: Transactional Multicast within Clusters


On Aug 26, 2014, at 8:43 PM, Dave Taht <dave.taht <at> gmail.com> wrote:

> This made my head hurt. Got code?

This is a very distributed algorithm. Actual code would be very dependent on implementation specifics
at each of the execution locations.

But to break this down, there are four relevant locations in this protocol:

1) The sender
2) The data plane forwarder of multicast frames/packets in the switch/mrouter.
3)The software which updates the control data that the data plane forwarder uses.
    This may be a processor that is part of the switch/mrouter. Concieably this could be
    built “in-line” in the forwarding chip, but that is not required and is probably not desirable.
   The amount of logic that can be considered trivial grows every year, so I don’t want to state
   that it would *never* be done in-line, but I wouldn’t want to buy such a switch anytime soon.
4) The receiver.

Something that I neglected to highlight in the first message is that 2) is unchanged from
existing switches/mrouters. This was a major constraint on the design, but it can be so
obvious when your working on it that you can forget to state it.

The detailed logic example for Method 1 provide in the original post applies to 3) the forwarder-associated
editing of the data used by the data plane to control multicast forwarding.

> 
>> 
>> On forwarder X the forwarding rule for new group I contains:
>>        The forwarding port for A.
>>        The forwarding port for B.
>>        The forwarding port to forwarder Y (a hub link). This eventually leads to C.
>> 
>> While on forwarder Y the forwarding rule for the new group I will contain:
>>        The forwarding port for forwarder X (a hub link). This eventually leads to A and B.
>>        The forwarding port for C.
>> 

This assumes the common data plane forwarding model:
* VLAN ID + multicast address indexes to a set of ports. The frame is forwarded on all of those ports except the
port the frame arrived upon.

So the forwarding port set to reach [A,B,C] is the union of the  port sets for [A],[B] and [C].

This is done when the dynamic transactional multicast group is set, not on a per frame/packet basis.

For method 1 the sender and the receivers have agreed to transmit a transaction on transaction multicast
group X, and that the membership
of the transactional group will be [A,B,C]. Rather than having A,B and C use IGMP/MLD to *join* the group,
the sender uses method 1 to
*push* the membership of the group to the relevant switches/mrouters.

Method 2 has the same two layer approach. The setup command is just more ambitious.
It is creating a large set of transactional multicast groups to cover every legitimate set of targets
within an existing multicast group.  For example, all combinations of 3 members of a 10 member
reference group.

One implementation would be as follows:

	t_group = base_t_group
	do
		set forwarding rule for t_group to union of forwarding ports for t_group’s members
		t_group = t_group + 1
	while permuting membership yields a new combination

With method 2 the sender uses the master group (referred to as the “Negotiating Group” in Nexenta’s
Replicast protocol) to announce which
of the pre-existing transactional_groups will be used to complete a given transaction. That
transctional group is now associated with the
transaction because each of the receivers has already granted some form of exclusive right to multicast to
it for a specific overlapping 
duration. This may be exclusive to the port/VLAN, or could conceivably be for a more specific scope. This is
dependent on the higher
layer protocol that is using these new methods for creating transactional multicast groups.

	

>> 
>> 
>> Caitlin Bestler
>> caitlin.bestler <at> nexenta.com
>> cait <at> asomi.com
>> 
>> 
> 


Gmane