Weighted Fair Queueing Scheduler (SCTP_SS_WFQ) description in ndata

Hi Michael,

 

In the context of understanding the SCREAM multi stream scheduling solution (rmcat) I tried to do

some reference checking to understand the exact algorithm envisaged for the ndata 3.1.6.  Weighted Fair Queueing Scheduler (SCTP_SS_WFQ).

 

Here it looks as there is a circular reference in between

 

https://tools.ietf.org/html/draft-ietf-rtcweb-data-channel-13

section 6.4:


 

   o  A priority, which is a 2 byte unsigned integer.  These priorities

      MUST be interpreted as weighted-fair-queuing scheduling priorities

      per the definition of the corresponding stream scheduler

      supporting interleaving in [I-D.ietf-tsvwg-sctp-ndata].  For use

      in WebRTC, the values used SHOULD be one of 128 ("below normal"),

      256 ("normal"), 512 ("high") or 1024 ("extra high").

 

and

http://datatracker.ietf.org/doc/draft-ietf-tsvwg-sctp-ndata/

section 3.1.6/3.2.6

 

3.2.5.  Weighted Fair Queueing Scheduler (SCTP_SS_WFQ_INTER)

 

3.2.5.  Weighted Fair Queueing Scheduler (SCTP_SS_WFQ_INTER)

 

   This scheduler is similar to the one described in Section 3.1.6, but

   based on I-DATA chunks instead of user messages.  This scheduler is

   used for WebRTC Datachannels as specified in

   [I-D.ietf-rtcweb-data-channel].

 

3.1.6.  Weighted Fair Queueing Scheduler (SCTP_SS_WFQ)

 

   A weighted fair queueing scheduler between the streams is used.  The

   weight is configurable per outgoing SCTP stream.  This scheduler

   considers the lengths of the messages of each stream and schedules

   them in a certain way to use the bandwidth according to the given

    ^^^^^^^^^^^^^^^^^^^^^^

   weights.

 

 

But neither document explicit tells, e.g., by at least a reference, what _the certain way_ of weighted-fair-queuing scheduling

is.  Is the intend for the exact algorithm to be left to implementation with only the intend that over time the required weighted bandwidth fairness be achieved ?

Is it up to implementations to handle, just examples:,  how to overshoot the CWND, to make it grow, if this means that the weighted bandwidth is breached.

Which stream go first in consummation of the CWND to begin with. How to evaluate message bundling options vis-à-vis fair weights.   More…

 

Would it not be relevant to give a general references as well as possibly some  implementation notes for how such a solution is/can be made or more explicitly state

what would be left to the implementation with a hint to some of the intricate aspects involved.  I assume that the FreeBSD SCTP implementation would be great source of information

for this (?)

 

BR, Karen

Black, David | 17 Aug 04:29 2015

TSVWG WGLC: draft-ietf-tsvwg-circuit-breaker-02

This email announces a TSVWG Working Group Last Call (WGLC) on:

	Network Transport Circuit Breakers
	draft-ietf-tsvwg-circuit-breaker-02 

Due to the timing (many people take time off during August), this WGLC will
run for longer than usual - it will be for just over 3 weeks, ending at
midnight US Eastern (daylight) Time on Tuesday, September 8 (this date &
time have been chosen to be after the US Labor Day holiday on Monday,
September 7).  Comments should be sent to the tsvwg <at> ietf.org list,
although purely editorial comments may be sent directly to the author
(Gorry Fairhurst <gorry <at> erg.abdn.ac.uk>).

As part of this WGLC, comments are requested on the appropriate RFC status
for this draft - this circuit breaker draft has been written with BCP (Best
Current Practice) RFC status in mind, but another status (e.g., Proposed
Standard) may be appropriate.  Reviews and comments should assume BCP as
the intended status, although that may change as a result of the WGLC
(in consultation with our ADs).

As I mentioned in the Prague tsvwg meeting, there is an expired WG draft
(formerly in the pwe3 WG, now the responsibility of the pals WG) on congestion
considerations for TDM pseudowires (PWs) - that draft contains some discussion
of a managed circuit breaker for TDM PWs.  If that draft reappears before
the end of this tsvwg LC on the circuit breaker draft, that reappearance
will be taken as a tsvwg WGLC comment requesting changes/additions to
section 5.3.1 of the tsvwg circuit breaker draft, which currently discusses
another managed circuit breaker for SAToP PWs.

To set a good example, I have reviewed the draft (it's well-written, and a
relatively easy read) and have a few LC comments as a WG chair to start this
WGLC:

Section 3.1, end of 1st paragraph:

   A CB is used to control traffic
   passing through a subset of these routers, acting between an ingress
   and a egress point.  In some cases, the ingress and egress may be
   within one or both network endpoints, in other cases they will be
   within a network device.  For example, one expected use would be at
   the ingress and egress of a tunnel service.

Editorial: "tunnel service" -> "service"

Editorial: Figure 1 shows measure and trigger at ingress - they could
also be at ingress or in network operations/management infrastructure,
e.g., with trigger event communicated to ingress to take action.  The
text following Figure 1 looks ok, so this is a request for text to
indicate that Figure 1 is not fully general in this regard.

Technical: p.7:

   o  A CB MUST define a measurement period over which the receiver
      measures the level of congestion.

"congestion" -> "congestion or loss"  Congestion may not be directly
measurable in all cases, even when ECN is used, as ECN nodes may drop
packets, and packets can be dropped for non-congestive reasons.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david.black <at> emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------

C. M. Heard | 14 Aug 20:35 2015
Picon

Re: New Version Notification for draft-touch-tsvwg-udp-options-01.txt

On 7/22/2015 09:52 AM, Joe Touch wrote:
> On 7/21/2015 11:22 PM, Brian Trammell wrote:
> > hi Joe,
> >
> > Thanks for this draft; I appreciate the elegant redundancy-reducing
> > length hack. :)
> >
> > Data in this case is, I know, hard to come by, but would you have
> > any feel for how much stuff out there will just break when they see an
> > inconsistency between IP and UDP length information?
>
> I have students starting this fall who will look into this and do some
> tests. We have no information yet.

In an off-list e-mail exchange with Joe a couple of weeks ago, I noted
that every host stack implementation whose code I have inspected simply
ignores bytes that are past the UDP length but within the IP payload
length.  The BSD-derived stacks trim the excess bytes before the data
is passed to the application via the sockets interface.  However, one
embedded stack I have seen (which does not use a sockets API) makes
all data available to the application, including the UDP header, and
lets the application deal with excess bytes as it sees fit.

I have zero information on the behavior of middleboxes (NAT/NAPT).

Assuming that Joe's tests confirm these observations for both end
systems and middleboxes, then the proposed UDP option trailer should be
incrementally deployable as long as all options therein can be safely
ignored if not understood.  The degree of utility (or, at least, the
length of time needed to make them useful) will of course depend
strongly on whether middleboxes trim the trailer or leave it intact;
if the prevalent middlebox practice is to trim it, then they won't be
useful without updating middleboxes as well as end systems.

Also, Joe, if you and your students have the time and resources to look at
what middleboxes do with UDP packets where the IP header indicates a
shorter length than the UDP header, that would be useful information, as it
could open up a possible means to incorporate fragmentation in the UDP
layer, independent of whether or not an options trailer is present.

Mike Heard

internet-drafts | 14 Aug 14:55 2015
Picon

I-D Action: draft-ietf-tsvwg-behave-requirements-update-04.txt


A New Internet-Draft is available from the on-line Internet-Drafts directories.
 This draft is a work item of the Transport Area Working Group Working Group of the IETF.

        Title           : Network Address Translation (NAT) Behavioral Requirements Updates
        Authors         : Reinaldo Penno
                          Simon Perreault
                          Mohamed Boucadair
                          Senthil Sivakumar
                          Kengo Naito
	Filename        : draft-ietf-tsvwg-behave-requirements-update-04.txt
	Pages           : 13
	Date            : 2015-08-14

Abstract:
   This document clarifies and updates several requirements of RFC4787,
   RFC5382 and RFC5508 based on operational and development experience.
   The focus of this document is NAT44.

The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-tsvwg-behave-requirements-update/

There's also a htmlized version available at:
https://tools.ietf.org/html/draft-ietf-tsvwg-behave-requirements-update-04

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-behave-requirements-update-04

Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

RFC Errata System | 7 Aug 20:21 2015

[Editorial Errata Reported] RFC7605 (4437)

The following errata report has been submitted for RFC7605,
"Recommendations on Using Assigned Transport Port Numbers".

--------------------------------------
You may review the report below and at:
http://www.rfc-editor.org/errata_search.php?rfc=7605&eid=4437

--------------------------------------
Type: Editorial
Reported by: John Klensin <john-ietf <at> jck.com>

Section: Abstract

Original Text
-------------
It provides designer guidance to requesters or users of port numbers on
how to interact with IANA using the processes defined in RFC 6335;
thus, this document complements (but does not update) that document.
It provides guidelines for designers regarding how to interact with
the IANA processes defined in RFC 6335, thus serving to complement
(but not update) that document.

Corrected Text
--------------
It provides designer guidance to requesters or users of port numbers on
how to interact with IANA using the processes defined in RFC 6335;
thus, this document complements (but does not update) that document.

Notes
-----
I think those two sentences say exactly the same thing and that the presence of both indicates that someone
wasn't paying quite enough attention during AUTH48 or earlier.  If they are intended to communicate
different information, it isn't clear what that is and the result is massively confusing.

Instructions:
-------------
This erratum is currently posted as "Reported". If necessary, please
use "Reply All" to discuss whether it should be verified or
rejected. When a decision is reached, the verifying party (IESG)
can log in to change the status and edit the report, if necessary. 

--------------------------------------
RFC7605 (draft-ietf-tsvwg-port-use-11)
--------------------------------------
Title               : Recommendations on Using Assigned Transport Port Numbers
Publication Date    : August 2015
Author(s)           : J. Touch
Category            : BEST CURRENT PRACTICE
Source              : Transport Area Working Group
Area                : Transport
Stream              : IETF
Verifying Party     : IESG

Ruediger.Geib | 7 Aug 08:52 2015
Picon

Re: RFC 4594 bis

Hi Fred,

I support your proposal to revise RFC 4594. Putting it on standards track - from my point of view requires
consensus from network operators on the result. That may become a challenge, but let's try.

Regards,

Ruediger

-----Ursprüngliche Nachricht-----
Von: tsvwg [mailto:tsvwg-bounces <at> ietf.org] Im Auftrag von Fred Baker (fred)
Gesendet: Donnerstag, 6. August 2015 17:32
An: tsvwg <at> ietf.org
Betreff: ***CAUTION_Invalid_Signature*** [tsvwg] RFC 4594 bis

At IETF 93, there was ad hoc discussion of revising RFC 4594 and taking it to Proposed Standard. At minimum,
such a project would include incorporating RFC 5865 and editing out statements that, in retrospect, may
be ill-advised. Personally, I might want to collapse some of the video services into a single service.
Most important, to me, would be incorporating operational experience. One of the key triggers for me to
think about this has been statements made in draft-szigeti-tsvwg-ieee-802-11e that I wasn't sure were a
good idea, and which Tim pointed out came directly from RFC 4594. Oops...

It will be a big job, I think.

Before I seriously consider it, I would like to know the opinion of the working group, and of the chairs. Is
this something we want to accomplish? Who would be willing to report their experience and suggest text or
review the outcome?

A little history: the process of writing RFC 4594 took at least three phases. I made an initial suggestion in
ieprep at IETF 54, in Yokohama. This was met with significant resistance from at least one key participant
in the Diffserv work, and I dropped it. My Nortel co-authors picked up the concept and the initial draft and
took it to the ITU effort that was then developing what became (IIRC) G.1010, and then brought it back to the
IETF and solicited my participation in the further development. We had differences of opinion, and we had
a fair bit of operational input on certain services. However, much of it was a literature review - EF came
from RFCs 3246 and 3247, CS* code points came from RFC 2474, AF from RFC 2597, Scavenger ("Low Priority")
from Internet2's service and RFC 3662, and I suspect (but don't know) that the logic regarding service
requirements came at least in part from the ITU discussions. Further subdivisions, especially in the
video classes, came from several vendors' products at the time, which were using AF code points for video
but complaining that the service as described had issues (they wanted to use the AFx2 and AFx3 code points
to identify the more disposable bits in layered codecs, which doesn't work if an ISP uses AF as described).

More recently, James Polk sought to add services in draft-polk-tsvwg-rfc4594-update. It proposed four
services for admitted traffic, building on RFC 5685, to whit:

> This document will import in four new '*-Admit' DSCPs from [ID-DSCP], 
> 2 others that are new but not capacity-admitted, one from RFC 5865, 
> and change the existing usage of 2 DSCPs from RFC 4594.
> This is discussed throughout the rest of this document.

It also changed some DSCP numbers, which is not backward compatible. It was a significant revision. I did
not support the effort, in large part because of lack of operational input to it.

So, *if* this is to be done, I want to be sure folks are interested in doing it, and that operational
experience will be reflected, not just the opinions of armchair theorists.

Who's in?

Fred Baker (fred | 6 Aug 17:32 2015
Picon

RFC 4594 bis

At IETF 93, there was ad hoc discussion of revising RFC 4594 and taking it to Proposed Standard. At minimum,
such a project would include incorporating RFC 5865 and editing out statements that, in retrospect, may
be ill-advised. Personally, I might want to collapse some of the video services into a single service.
Most important, to me, would be incorporating operational experience. One of the key triggers for me to
think about this has been statements made in draft-szigeti-tsvwg-ieee-802-11e that I wasn't sure were a
good idea, and which Tim pointed out came directly from RFC 4594. Oops...

It will be a big job, I think.

Before I seriously consider it, I would like to know the opinion of the working group, and of the chairs. Is
this something we want to accomplish? Who would be willing to report their experience and suggest text or
review the outcome?

A little history: the process of writing RFC 4594 took at least three phases. I made an initial suggestion in
ieprep at IETF 54, in Yokohama. This was met with significant resistance from at least one key participant
in the Diffserv work, and I dropped it. My Nortel co-authors picked up the concept and the initial draft and
took it to the ITU effort that was then developing what became (IIRC) G.1010, and then brought it back to the
IETF and solicited my participation in the further development. We had differences of opinion, and we had
a fair bit of operational input on certain services. However, much of it was a literature review - EF came
from RFCs 3246 and 3247, CS* code points came from RFC 2474, AF from RFC 2597, Scavenger ("Low Priority")
from Internet2's service and RFC 3662, and I suspect (but don't know) that the logic regarding service
requirements came at least in part from the ITU discussions. Further subdivisions, especially in the
video classes, came from several vendors' products at the time, which were using AF code points for video
but complaining that the service as described had issues (they wanted to use the AFx2 and AFx3 code points
to identify the more disposable bits in layered codecs, which doesn't work if an ISP uses AF as described).

More recently, James Polk sought to add services in draft-polk-tsvwg-rfc4594-update. It proposed four
services for admitted traffic, building on RFC 5685, to whit:

> This document will import in four new '*-Admit' DSCPs from
> [ID-DSCP], 2 others that are new but not capacity-admitted, one from
> RFC 5865, and change the existing usage of 2 DSCPs from RFC 4594.
> This is discussed throughout the rest of this document.

It also changed some DSCP numbers, which is not backward compatible. It was a significant revision. I did
not support the effort, in large part because of lack of operational input to it.

So, *if* this is to be done, I want to be sure folks are interested in doing it, and that operational
experience will be reflected, not just the opinions of armchair theorists.

Who's in?
internet-drafts | 3 Aug 09:26 2015
Picon

I-D Action: draft-ietf-tsvwg-rfc5405bis-05.txt


A New Internet-Draft is available from the on-line Internet-Drafts directories.
 This draft is a work item of the Transport Area Working Group Working Group of the IETF.

        Title           : UDP Usage Guidelines
        Authors         : Lars Eggert
                          Godred Fairhurst
                          Greg Shepherd
	Filename        : draft-ietf-tsvwg-rfc5405bis-05.txt
	Pages           : 49
	Date            : 2015-08-03

Abstract:
   The User Datagram Protocol (UDP) provides a minimal message-passing
   transport that has no inherent congestion control mechanisms.  This
   document provides guidelines on the use of UDP for the designers of
   applications, tunnels and other protocols that use UDP.  Congestion
   control guidelines are a primary focus, but the document also
   provides guidance on other topics, including message sizes,
   reliability, checksums, middlebox traversal, the use of ECN, DSCPs,
   and ports.

   Because congestion control is critical to the stable operation of the
   Internet, applications and other protocols that choose to use UDP as
   an Internet transport must employ mechanisms to prevent congestion
   collapse and to establish some degree of fairness with concurrent
   traffic.  They may also need to implement additional mechanisms,
   depending on how they use UDP.

   Some guidance is also applicable to the design of other protocols
   (e.g., protocols layered directly on IP or via IP-based tunnels),
   especially when these protocols do not themselves provide congestion
   control.

   If published as an RFC, this document will obsolete RFC5405.

The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-tsvwg-rfc5405bis/

There's also a htmlized version available at:
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc5405bis-05

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-rfc5405bis-05

Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

Black, David | 2 Aug 18:25 2015

Prague: Confirming meeting decisions

As with all IETF meetings, decisions made during the TSVWG meeting in Prague
need to be confirmed on the list.  The following is the "sense of the room"
on these items from Prague.  Anyone who disagrees should say so on the list,
please.

-- 4.2 GRE over UDP direction

The proposed path forward is write two different sets of applicability text:

(1) Network operator usage.  This could use GRE in full generality and omit
UDP checksum for IPv6 if/as appropriate, text can be extensively based on
RFC 7510 (MPLS/UDP).

(2) General Internet usage.  This would be restricted - traffic SHOULD be
congestion controlled/responsive (e.g., IP traffic aggregates), UDP checksum
for IPv6 would be REQUIRED.

NB: The "This could" and "This would" sentences above are intended as
examples to illustrate the different applicability scopes - actual WG support
(or lack thereof) for the specific provisions in those sentences will be
based on list discussion, at WG Last Call at the latest.

-- 5.1 SCTP (RFC 4960) Errata and Issues

The "sense of the room" is that the WG should work on SCTP specification
maintenance above and beyond the RFC Editor's collection of errata.  The
WG chairs should consult with the ADs on how to move forward on this.

-- 6.2 DSCPs for Web RTC

Justin Uberti's proposal - set SCTP association DSCP to highest applicable
DSCP of any DataChannel in WebRTC, and reset the congestion control if
needed at the point where the DSCP changes - was supported by the Friday
rtcweb WG meeting.

If there are any tsvwg WG objections to this direction, they should be
posted to the list promptly, please.

-- 6.5 draft-wei-tsvwg-tunnel-congestion-feedback

The "sense of the room" is that the tsvwg WG should adopt this draft
with an Informational RFC target that could be upgraded once the
details on the Congestion Management (response) mechanisms are
specified (which may be via reference to other drafts.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david.black <at> emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------

Black, David | 2 Aug 18:12 2015

Prague draft minutes

The draft minutes from Prague have been uploaded:
	https://www.ietf.org/proceedings/93/minutes/minutes-93-tsvwg

Please send comments/corrections to the list.

Many thanks to Karen Nielsen and Szilveszter Nadas for taking notes!

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david.black <at> emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------

Bob Briscoe | 30 Jul 13:47 2015
Picon

Invitation to subscribe to new DCTCP Evolution mailing list: (tcpPrague <at> ietf.org)

tcpm, tsvwg, tsvarea, iccrg lists

Recently, there have been developments (see URLs at end) that would make it possible to deploy scalable low-latency low-loss protocols like Data Center TCP alongside a mix of traffic, either in data centres and private networks, or even on the public Internet. One approach was demonstrated at the recent IETF in Prague, showing DCTCP giving ultra-low latency over a broadband Internet access while competing with a mix of Internet traffic on roughly equal terms.

As a result of an ad hoc meeting ("Bar BoF" = Birds of a Feather) at the Prague IETF, we have formed a new mailing list.
I'd like to invite you to join the list via: <https://www.ietf.org/mailman/listinfo/tcpprague>

The idea is to ensure that those working on DCTCP implementations across platforms (Free BSD, Linux, Windows, ...) will converge on solutions that will interwork with each other and with existing traffic. Although it is under the IETF's umbrella, we hope and expect that discussion will be as much about implementation as writing standards. However, we get the benefit of the IETF's IPR disclosure rules, and of course it fits the IETF's purpose of interoperability.


The draft notes of the meeting are below.
And below that, is the original announcement with some context and background URLs.
You can catch up on any discussion you've missed using the list archives via the link above.


If you want to respond about something most relevant to tcpprague, pls avoid cross-posting to all the other lists in this announcement.


Cheers



Bob Briscoe



-------- Forwarded Message -------- Subject: Date: From: To:
Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 2015, 17:40, Prague
Tue, 28 Jul 2015 14:00:46 +0100
Bob Briscoe <ietf <at> bobbriscoe.net>
TCP Prague IETF List <tcpPrague <at> ietf.org>


Folks,

These notes have taken a week, because I've only just put my machine back together after having to rebuild the hardware a little :|


Notes: DCTCP Evolution Bar BoF
6-7pm Tue 21 Jul 2015, Budapest room, The Hilton, Prague, CZ

Summary of Actions:
Lars E: Set up tcpprague wiki page
Bob B: Request tcpprague <at> ietf.org mailing list, via IETF process (requires Area Director approval)
Bob B: Document Rationale - initiate a para on wiki.
Lars E: fwd Dagstuhl invitee list to Bob
Bob B: Set up list on wiki to assign people to invite those not in the room to join.

* 18:00 Introductions - name and interest
Present:
Marcelo    Bagnulo Braun    <marcelo <at> it.uc3m.es>
Praveen    Balasubramanian    <pravb <at> microsoft.com>
Martin    Bekker    <martin.becke <at> haw-hamburg.de>
Bob    Briscoe    <ietf <at> bobbriscoe.net>
Anna    Brunstrom    <anna.brunstrom <at> kau.se>
Stuart    Cheshire    <cheshire <at> apple.com>
Koen    De Schepper    <koen.de_schepper <at> alcatel-lucent.com>
Fabien    Duchen    <fabien.duchene <at> uclouvain.be>
Phil    Eardley    <philip.eardley <at> bt.com>
Lars    Eggert    <lars <at> netapp.com>
Michio    Honda    <michio <at> netapp.com>
Per    Hurtig    <Per.Hurtig <at> kau.se>
Jana    Iyengar    <jri <at> google.com>
Naeem    Khademi    <naeem.khademi <at> gmail.com>
Mirja    Kuehlewind    <mirja.kuehlewind <at> tik.ee.ethz.ch>
Matt    Mathis    <mattmathis <at> google.com>
Andrew    McGregor    <andrewmcgr <at> google.com>
Karen    Nielsen    <karen.nielsen <at> tieto.com>
Tommy    Pauly    <tpauly <at> apple.com>
Andreas Petlund <apetlund <at> ifi.uio.no>
Costin    Raiciu    <costin.raiciu <at> cs.pub.ro>
Pasi    Sarolahti    <pasi.sarolahti <at> iki.fi>
Richard    Scheffenegger    <rs <at> netapp.com>
David    Schinazi    <dschinazi <at> apple.com>
Randall    Stewart    <randall <at> lakerest.net>
Dave    Thaler    <dthaler <at> microsoft.com>
Brian    Trammell   <ietf <at> trammell.ch>
Michael    Tuexen    <Michael.Tuexen <at> lurchi.franken.de>
Felix    Weinrank    <weinrank <at> fh-munster.de>
Michael    Welzl    <michawe <at> ifi.uio.no>
Alex    Zimmermann    <alexander.zimmermann <at> netapp.com>

* Scope and Agenda Bashing

[Non-italic text is from the materials pre-prepared by Koen De Schepper and Bob Briscoe.
Italic text summarises conversation in the room.]

Meeting is covered by the standard IETF "Note Well" concerning intellectual property.

Scope:
* Evolving the e2e DCTCP protocol for use alongside existing traffic (whether in DCs, private nets or public Internet).
* Primarily to get DCTCP /developers/ involved early (Windows, FreeBSD, Linux), so that whatever we decide to standardise can be implemented in parallel
  (Doing implementation and standardisation in series is not desirable, in whichever order).
* Primarily an organisational meeting about creating a forum / community to do this work, using people's experience to know what will work best.

Not in Scope:
* Network changes are not in scope unless they impact the list of changes needed to DCTCP
* The in-network side of the solution (two approaches exist [DCttH, Judd15], others may follow).
* Identifier of DCTCP-like traffic (please discuss by email, not in this meeting)

Lars E: Informational draft recording Microsoft's DCTCP should not be stalled by this, as it has value of its own.
   Unanimous agreement.

Praveen S: Microsoft has offered a royalty free license for DCTCP IPR.

Karen N: Is DCTCP over a non-TCP transport (e.g. SCTP) in scope?
   Unanimous "Yes"

Outcome of discussion on the features of this DCTCP-like congestion control that define this work:
  1. Must use ECN, but unlike RFC3168 ECN, marking is not merely equivalent to drop,
    so ECN signals can be more plentiful and sooner than drop.
  2. Packet rate is proportional to 1/p, where p is the ECN marking probability.
Matt M: 1/p makes congestion control scale with the bandwidth, by making the intensity of congestion control signals per RTT invariant.

Stuart Ch: Apple is turning on ECN by default in clients. Currently in developer seeds but probably in the next releases.  Packet loss is also not a mystery.

* 18:15 List of /must-have/ changes before deployment alongside existing traffic.

Matt M: Rather than a "MUST-have" list, produce a prioritised list, because where to draw the necessity line could depend on the use-case.

The following list wasn't formally prioritised in the meeting, but items where some people questioned necessity are shifted down.
  1. Fall back to Reno or Cubic behaviour on loss;
    For how long? quick consensus: 1 RTT, but needs further discussion. ECN response continues to operate in parallel.
  2. Negotiate altered feedback semantics, to convey the extent of ECN marking, not just its existence, and this feedback needs to be robust to loss [RFC-to-be 7560];
    Mirja K, Richard S & Bob B plan to have spec of much simpler solution out soon.
  3. Use of a standardised packet identifier (if ECN-capable is not enough)
    Identifier tbd.
    - - - 8< - - - - - - - - highest line between "must-have for safety" and "would be nice for performance" - - - - - -  8< - - - -
  4. Handle a window of less than 2 when the RTT is low, rather than increase the queue [TCP-sub-mss-cwnd], like TCP Nice.
    Michael W: Is this "must-have"? Quite a complicated step.
    Bob B: Yes, but, otherwise DCTCP will pollute ultra-low latency queues from the start.
  5. Average ECN feedback over its own RTT, not the hard-coded RTT suitable only for data-centres, perhaps reduce cwnd by seg-size/2 per ECN Echo, like Relentless TCP [Mathis09];
    ???: How bad would long-RTT flows be?  More generally, how can we evaluate all this?
    Bob B: With mixed RTTs, flows with RTT > a couple of ms will respond too quickly to bursts. Whatever, it's already been implemented by Mohammad Alizadeh in Linux, and evaluated, so this is easy.
  6. Heuristic testing for classic ECN bottlenecks
    The idea would be to detect a 'classic' RFC316 bottleneck by whether appreciable delay growth accompanies the marking (originally suggested by Michael W).
    Bob B: Complex and slow to detect, so it would have to learn and cache for new flows - suggest this should only be a must-have if measurements prove it to be a problem - i.e. if a significant proportion of classic ECN bottlenecks exist
    Matt M: No need for this - rate mismatch no worse than TCP already sees with RTT discrepancies.
     - - - 8< - - - - - - - - lowest line between "must-have for safety" and "would be nice for performance" - - - - - -  8< - - - -
  7. Costin R: Faster-than-additive increase (similar to Cubic)
    A performance improvement, not "must-have", but would be nice to have while we're doing this.
  8. [Not discussed in the meeting, but added by Bob B for the record]: Less drastic exit from slow-start, to match less drastic rate reduction per mark.
    Currently, because marking threshold is shallow, slow start exits earlier than with drop, unnecessarily increasing completion time.

Costin R: Any other way to evolve towards DCTCP over mixed networks, without separate queues in the network?
Bob B: To discuss on ML, and if we continue with the proposed approach, we must record the rationale on the WIki.

* 18:30 Brainstorm to identify people not present who will be important to this.

<!-- body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small } --> Mohammad    Alizadeh    <alizadeh.mr <at> gmail.com>
Grenville    Armitage    <garmitage <at> swin.edu.au>
Fred    Baker    <fred <at> cisco.com>
Stephen    Bensley    <sbens <at> microsoft.com>
Daniel    Borkmann    <daniel.borkmann <at> alumni.ethz.ch>
Yuchung    Cheng    <ycheng <at> google.com>
Kenjiro    Cho    <kjc <at> iijlab.net>
邓灵莉/Lingli    Deng    <denglingli <at> chinamobile.com>
Eric    Dumazet    <edumazet <at> gmail.com>
Gorry    Fairhurst    <gorry <at> erg.abdn.ac.uk>
Jamal    Hadi Salim    <hadi <at> mojatatu.com>
Glenn    Judd    <glenn.judd <at> morganstanley.com>
Midori    Kato    <katoon <at> sfc.wide.ad.jp>
Kenneth    Klette Jonassen    <kennetkl <at> ifi.uio.no>    (already subscribed)
Sridharan,    Murari    <muraris <at> microsoft.com>
Hiren    Panchasara    <hiren.panchasara <at> gmail.com>
Hagen     Pfeifer    <hagen <at> jauu.net>
Balaji    Prabhakar    <balaji <at> ee.stanford.edu>
KK    Ramakrishnan    <kk <at> cs.ucr.edu>
Lawrence    Stewart    <lstewart <at> netflix.com>
Dave    Taht    <dave.taht <at> gmail.com>
Florian    Westphal    <fw <at> strlen.de>

Agreed to cc to the following for awareness, but no need to invite to join the list:
Stephen    Hemminger    <stephen <at> networkplumber.org>
David    Miller    <davem <at> davemloft.net>

Missing types of organisations:
  • Network operators (not so relevant for e2e protocol, but need to be motivated to deploy the network part)
  • CDNs
[Bob B adds: Subsequent to mtg, Erik Nygren tells me Xin Zhang leads Akamai's congestion control team. Also I noticed Hiren used to work at Limelight, so may have contacts]

Lars E: Co-organising a Dagstuhl retreat around DCTCP. Will forward list of invitees to Bob to notify once the ML exists.
Also Lars's list of FreeBSD and Linux devs.

* 18:40 What is the best way to ensure the outputs from a number of separate developers all converge in parallel to standardisation?
Common Test Suite
Interop events
Plugfests
Serving paths (e.g. Google's) capable of serving this

* 18:50 Next steps: Actions to set up suitable MLs, tools, with timesales etc.

Discussed pros and cons of hosting ML on ietf.org.
General agreement: use ietf.org for ML - because the IPR Note Well is useful.

Name for ML?
Matt M: TCP Prague (for an evolving protocol, a meaningless tag is best).
Karen N: ecn-prague, because it's not just TCP?

Agreed: tcpprague <at> ietf.org

Actions:
Bob B: ML - ask SpencerD/MartinS, following the documented process
Lars E: Set up wiki page - for assigning people to send out invitations

* End 19:05


Notes: Bob Briscoe, helped by Andrew McGregor
28 Jul 2015


-------- Forwarded Message -------- Subject: Date: From: To: CC:
DCTCP evolution 'bar BoF': Tue 21 Jul 2015, 17:40, Prague
Mon, 20 Jul 2015 22:46:14 +0100
Bob Briscoe <ietf <at> bobbriscoe.net>
Mirja Kuehlewind <mirja.kuehlewind <at> tik.ee.ethz.ch>, EGGERT, Lars <lars <at> netapp.com>, Dave Thaler <dthaler <at> microsoft.com>, Praveen Balasubramanian <pravb <at> microsoft.com>, Alex Zimmermann <alexander.zimmermann <at> netapp.com>, Richard Scheffenegger <rs <at> netapp.com>, Fred Baker <fred <at> cisco.com>, Matt Mathis <matt.mathis <at> gmail.com>, Andrew McGregor <andrewmcgr <at> google.com>, Dave Taht <dave.taht <at> gmail.com>, Stuart Cheshire <cheshire <at> apple.com>, Michael WELZL <michawe <at> ifi.uio.no>, Andreas Petlund <andreas <at> petlund.no>, Gorry Fairhurst <gorry <at> erg.abdn.ac.uk>, Anna Brunstrom <anna.brunstrom <at> kau.se>
De Schepper, Koen (Koen) <koen.de_schepper <at> alcatel-lucent.com>


Folks,

DCTCP evolution 'bar BoF': Tue 21 Jul 2015, 17:40, Prague
Location: Unless I have emailed with a room location before then, pls meet at the IETF reception.

Koen & I are trying to get together people in Prague who are involved in development of DCTCP across platforms (Windows, Free BSD, Linux, etc), and who are interested in evolving it for use on heterogeneous networks, e.g.
* data centres with a mix of TCP flavours, not just DCTCP
* private networks
* the public Internet

Pls fwd this invite to anyone in Prague who ought to be involved that I've missed (pls cc everyone else too).

Sorry for short notice.

One purpose of the session will be to build a community beyond the IETF, so I'd like us to compose an email to a wider set of people after the session, e.g.:

Stephen Bensley <sbens <at> microsoft.com>
Glenn Judd <glenn.judd <at> morganstanley.com>
Daniel Borkmann <daniel.borkmann <at> alumni.ethz.ch>
Florian Westphal <fw <at> strlen.de>
邓 灵莉/Lingli Deng <denglingli <at> chinamobile.com>
Mohammad Alizadeh <alizadeh.mr <at> gmail.com>
Stephen Hemminger <stephen <at> networkplumber.org>
David S. Miller <davem <at> davemloft.net>
Sridharan, Murari <muraris <at> microsoft.com>
Yuchung Cheng <ycheng <at> google.com>


Koen & Bob

PS. Below is some background, and some agenda ideas. Pls discuss, bash and add your own.


We've recently developed an AQM that allows DCTCP to co-exist with Cubic/Reno etc. with zero config. Links below.

We have to add some necessary mechanisms to DCTCP (listed below) so it will be safe alongside other traffic. Two questions:

Q1. We don't want to fork DCTCP. Does anyone think a fork optimised for homogeneous DCTCP would be better?

Q2. Anyone interested in helping?
We have an idea how to do each one, but sharing the load would be great - there's Linux, FreeBSD, Windows, etc. to cover.

List of the 4 essential 'safety' mods to DCTCP (copied from the IETF Internet Draft linked below) and one that might need to be essential:

   o  fall back to Reno or Cubic behaviour on loss;

 

   o  negotiate its altered feedback semantics, which conveys the extent

      of ECN marking, not just its existence, and this feedback needs to

      be robust to loss [I-D.ietf-tcpm-accecn-reqs];

 

   o  handle a window of less than 2 when the RTT is low, rather than

      increase the queue [TCP-sub-mss-w].

 

   o  average ECN feedback over its own RTT, not the hard-coded RTT

      suitable only for data-centres, perhaps like Relentless

      TCP [Mathis09];    o  Use of a standardised packet identifier (if ECN-capable is not enough)    o  Heuristic testing for classic ECN bottlenecks (optional?)


We're trying to move fast because if we can get on top of other developments (e.g. Apple's recent release of ECN), it will mean less messy classification code in the AQM.
So the links below are not on official sites yet.

‘Data Centre to the Home’: Ultra-Low Latency for All
<http://www.bobbriscoe.net/projects/latency/dctth_preprint.pdf>

Highlights:
* 1ms 99%-ile queuing delay for all DCTCP traffic in thousands of expts incl. high load,
   over an e2e test network with real broadband equipment.
* DCTCP co-existence with Reno & Cubic, with no transport ID inspection.
* less ops per packet than RED
* Zero config

IETF Draft to standardise those parts of the AQM relevant to interop
(not yet submitted to IETF):
<
http://www.bobbriscoe.net/projects/latency/draft-briscoe-aqm-dualq-coupled-00.txt>



Koen & Bob


Gmane