[AVTCORE] Fwd: Re: [MMUSIC] Issues with bandwidth modifiers in SDP
Magnus Westerlund <magnus.westerlund <at> ericsson.com>
2011-07-08 08:37:06 GMT
WG,
I received comments from Randell Jesup on the MMUSIC reflector, but I
think they are equally relevant to this WG. So here it he comments with
my response to them.
Hi Randell,
Thanks for the feedback. See inline for feedback on your comments.
On 2011-07-07 17:13, Randell Jesup wrote:
> On 7/7/2011 4:46 AM, Magnus Westerlund wrote:
>> Hi,
>>
>> Bo Burman and I have written an I-D that discusses a number of issues
>> around conferencing, including handling of multiple SSRCs in the same
>> RTP session. This also has brought forward an issue with the current
>> bandwidth modifiers. They are very ill suited when you have both
>> asymmetric conditions based on which direction the flows goes and how to
>> indicate individual streams versus limits for the whole aggregate. This
>> include the issue of having payload specific per stream upper bounds.
>>
>> https://datatracker.ietf.org/doc/draft-westerlund-avtcore-multistream-and-simulcast/
>
> Up front: realize I'm commenting as I read, but that means that it's
> also an indication if
> something wasn't clear up-front without reading more of later parts of
> the spec.
>
>
> The purpose of the document is a little fuzzy - is it for simulcast (as
> stated and defined
> in the doc) or for both multiple-source and simulcast (as implied in the
> document and its
> title). It seems to be the former, but the title and some early
> statements imply it's both.
Yes, this is document that has grown while we wrote it. I actually
expect that we start splitting up the different pieces into individual
documents and tries to clarify what is applicable in what case.
But our intention from the start is to address both cases. But yes we
started with simulcast and for that we do need to resolve the multiple
SSRC in RTP session problems. So simulcast requires multiple SSRCs,
however multiple SSRC has many usages beyond simulcast.
>
> Also it should be made clear (if it isn't; I'm reading fast) that these
> mixer cases specifically
> would include one-to-one communication through a server. Also: should
> this discuss
> non-mixer cases (point-to-point simulcast)? I think so.
We tried to be clear that our main focus for simulcast is the
centralized case using a RTP mixer that selects which version to
forward. But Section 3 does look at the other usages of simulcast, like
point to point.
>
> Thus there has been perceived little need for
> handling multiple SSRCs in implementations. This has resulted in an
> installed legacy base that isn't fully RTP specification compliant
> and will have different issues if they receive multiple SSRCs of
> media, either simultaneously or in sequence. These issues will
> manifest themselves in various ways, either by software crashes, or
> simply in limited functionality, like only decoding and playing back
> the first or latest SSRC received and discarding any other SSRCs.
>
>
> As an implementer of a point-to-point videophone, we handled SSRC changes,
> but only were set up to decode one stream at a time (especially of
> video). (A later
> phone could decode up to 3 and encode 1 when acting as a bridge for an
> ad-hoc video conference.)
>
> Our algorithm was to immediately react to an SSRC change, but then lock
> out changes
> for a (short) period because since we only played one stream at a time
> we didn't want to reset the
> codec on every other incoming packet. This also helped us with a few
> other usecases involving
> various B2BUA-ish servers and server-mediated call transfers, etc.
>
> Not truly compliant per-se, of course, but worked for the usecases we
> needed to deal with.
Yes, and what you say is a confirmation of the claim we are making.
There is a number of implementations that has limited support for
multiple SSRCs but they don't express that in signalling. Therefore to
make these multiple SSRC applications work well going forward we need
explicit signalling of the number of simultanous SSRCs one support.
>
> Simulcasting has the benefit that it is conceptually simple. It
> enables use of any media codec that the participants agree on,
> allowing the mixer to be codec-agnostic. Considering today's video
> encoders, it is less bit-rate efficient in the path from the sending
> client to the mixer but more efficient in the mixer to receiver path
> compared to Scalable Video Coding.
>
>
> If you're targeting a single lower-bitrate encoding, yes. If the
> bandwidth down to each receiver
> is adaptive to their available bandwidth (think mobile reception, or
> WiFi) then giving the mixer
> the ability to subset differently for each receiver without having to
> re-encode. (And it avoids
> having to tell the sender to change bitrates if one of the receivers
> needs a lower rate than the
> sender is sending, and it avoids giving all the other recipients a
> lowest-common-denominator
> stream). Of course this is part of the entire value-proposition of SVC
> - that it enables this sort of
> optimized reception with a single encoding and a low-horsepower mixer.
> Yes, I realize I'm saying
> fairly obvious things here.
Exactly, this is the whole point of simulcasting. And our research hows
that this has some advantages over SVC.
>
> If only resolution and temporal variations
> are needed, this can be implemented using H.264, as each simulcast
> version provides the different resolution, and each media stream
> within a simulcast encoding has temporal scalability using no-
> reference frames.
>
>
> Note that this implies that the mixer parses the H.264 to the point
> where it knows if the frame
> will be used as a reference frame. Not a problem, and probably doesn't
> require any change to the text.
Yes. you need to look at the NRI bits in the H.264 RTP payload header to
make this determination. But you need to do similar operations in an SVC
MANE if you single stream transport (SST) payload mode. So no real
difference for the temporal scalability operation.
>
> Even within SVC (and you may have stated or implied this) you could make
> multiple encodings
> with differently-targeted layers - for example a "low" set targeting
> bandwidths <500K, and a
> high set for 500K and up. This might help avoid the problem of too many
> layers degrading
> the bit-efficiency of the encoding.
I try to make clear that I see combining SVC for medium and fine-grained
scalability while using different SVC encodings for the large steps and
send them as simulcasts to improve the efficiency of the system as an
option.
>
> 3.2. Simulcasting to Consuming End-Point
>
>
> This section is clearly based on the assumption (up front) that the
> streams are in fact identical
> content with different encodings. Not a problem, especially if the
> target of the document is
> a little clearer.
Yes, we always target simulcast where the original source of the content
is the same, only the encoded representation is different.
>
> 3.3 and 3.4: These assume no mixer, right?
Yes.
>
> 4.3: negotiation of transmission side - how does an endpoint indicate
> these are "equivalent"
> streams and not "alternative" streams (different camera/mic, etc). How
> are the differing
> encodings negotiated, or is that out of scope? (I'd think it would be
> in-scope, because the mixer
> is the one who knows who's connected to it and their capabilities.)
I hope you got this explained when you read on. We have the SCS and SCR
grouping semantics to indicate that media lines are simulcast
alternatives to each other. The actual SSRCs in the different RTP
sessions are paired as alternatives using the SRCNAME SDES item. In
practice SRCNAME that are identical for media streams of the same media
type are alternatives to each other. The negotiation of what alternative
stream is done as regular offer/answer for each media line, thus
allowing us to avoid re-inventing that mechanism.
>
> 4.5: SDP signaling and bandwidth - b=AS could be used in theory for
> either direction, however
> the general offer/answer semantic is that you define what you're
> willing/ready to receive, not send,
> so b=AS would be the downstream direction, not the highest of upstream
> or downstream - I don't
> know of any that use it that way today. Some use upstream (perhaps
> most), some use downstream.
Yes, I think there is great confusion around how the bandwidth modifier
actually are used. Our examples in the end do use b=AS as receive
direction only. The discussion in this section may not be as consistent.
I think the main point from our side is that there is need for
additional bandwidth signalling attributes to allow one to express
assymetric capabilities vs desire and do O/A negotiation of it. And do
that both for individual steams and encoding proposals and for the whole
session aggregate.
> Also there should be discussion of how RTCP TMMBR fits into this.
Likely, from my perspective TMMBR has quite clear semantics as it
applies to a particular SSRC. Thus the stream and direction for a given
TMMBR limiation is clear. But, I guess it is good to understand what the
upper limit for that stream may be.
>
> nit in 6.2:
>
> When one have multiple actual media sources in a session,
>
> have->has
Ok
>
> 7.1.2: SSRC Multiplexing
>
> Note that "legacy" receivers will often be much happier with NO SSRC change when the
> mixer switches streams. While in theory they should handle them, not all do. And even
> if they do, they may cause glitches at the transition (decoder resets, etc). Yes, they
> shouldn't do that.
Full agree, and as this is an RTP mixer I assume that the mixer will
have its own SSRC value that it uses to send what ever stream it mixes
or selects from the incomming the content of the media, but sent as a
consistent media stream from the mixers SSRC. Thus we don't have SSRC
switching. The issue I bring up is that of the CSRC field. If one
include CSRC then that changes.
But, in fact I think this is mostly a mote argument. Also for Session
multiplexing we likely will have this issue as using different SSRCs in
the different RTP sessions have advantages.
>
>
> NAT Traversal Failure Rate: Due to that one need more than a single
> flow to be established through the NAT there is some risk that one
> succeed in establishing the first flow but fails with one or more
> of the additional flows. The risk that this happens are hard to
> quantify. However, that risk should be fairly low as one has just
> prior successfully established one flow from the same interfaces.
> Thus only rare events as NAT resource overload, or selecting
> particular port numbers that are filtered etc, should be reasons
> for failure.
>
> As the number of flows increases, the odds of a lost packet (or congestion)
> increase. The pacing timer should help with this, but even so I believe the
> odds of problems will increase above a linear rate. However, this is an
> off-the-cuff assessment and I haven't followed the ICE discussions (N zillion MB
> of them...).
Maybe, and I would love to know how real the issue is. Also, to a
certain degree this is an orthogonal problem if it is real. Because it
will occur also for SSRC multiplexing as even that case has multiple
underlying UDP flows, just a few less than Session multiplexed Simulcast.
>
> 8.2. Mixer Requests of Client streams
>
> I'd guess TMMBR of 0 may not go over well with some existing clients, or may be ignored.
Very much possible. We plan to follow up on this topic in the future.
>
> 8.4 Multiplexing sessions
>
> This touches on the discussions currently in rtcweb
Yes, it does. It has also been voiced in discussion around CLUE interest
of multiplexing multiple sessions on top one underlying transport flow.
My main point is that it is separate problem that needs to be addressed
somehow. If you are interested in this problem I would recommend that
one do read the next version of:
https://datatracker.ietf.org/doc/draft-perkins-rtcweb-rtp-usage/
It will be out on Monday. Which tries to make clear the issues which the
Rosenberg's proposal and discuss possible ways forward for that topic.
https://datatracker.ietf.org/doc/draft-rosenberg-rtcweb-rtpmux/
>
> 8.5.1.2 Multiple sources (telepresence)
>
> Ok, this confuses me as the discussion I thought was of simultaneous encodings
> of the same data, not true different sources.
Well, this is the case that tries to make it clear how you can combine
multiple real media source, like several cameras in a telepresence room,
with simulcast.
>
> At this point I'm afraid my eyes have glazed over reading the (very nicely detailed
> and explained) SDP examples.
Yes, those examples are quite beefy. But, I do hope that they will help
people understand, it also helped us making clear that we had a coherent
story for many of the different usages we see for what we propose.
Cheers
Magnus Westerlund
----------------------------------------------------------------------
Multimedia Technologies, Ericsson Research EAB/TVM
----------------------------------------------------------------------
Ericsson AB | Phone +46 10 7148287
Färögatan 6 | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund <at> ericsson.com
----------------------------------------------------------------------
_______________________________________________
mmusic mailing list
mmusic <at> ietf.org
https://www.ietf.org/mailman/listinfo/mmusic
_______________________________________________
Audio/Video Transport Core Maintenance
avt <at> ietf.org
https://www.ietf.org/mailman/listinfo/avt