From: Randy Stewart <randall <at> lakerest.net>
To: Chinmaya Dwibedy <ckdwibedy <at> yahoo.com>
Cc: Chris Benson <cbenson <at> adax.com>; "tsvwg <at> ietf.org" <tsvwg <at> ietf.org>
Sent: Friday, June 1, 2012 5:05 AM
Subject: Re: [tsvwg] When SCTP server is restarted, DATA chunks start to get dropped or lost after sometime at SCTP client’s end.
Chinmaya:
I would use that
has a starting point. Don't put a time-to-expire on it and see if
you no longer see this behavior. If you do, then it sounds like a bug in the linux kernel
implementation
R
On May 31, 2012, at 3:20 PM, Chinmaya Dwibedy wrote:
> Hi Chris,
> Thanks for your response.
> Our application consumes the message fast enough. Note that, we are using single Intel Xeon Quad-Core L5518 processor, 8 logical CPUs (speed of the processor is 2.13GHz) and 12GB RAM. Moreover there are no other apps running on besides our application. This problem appears after SCTP server gets restarted. There is absolutely no packets drop if we won’t restart the server.
>
> Do you recommend us to try with 0 ms timetolive which indicates no timeout should occur on this message and is treated as reliable?
>
> Regards,
> Chinmaya
>
> From: Chris Benson <
cbenson <at> adax.com>
> To: Chinmaya Dwibedy <
ckdwibedy <at> yahoo.com>
> Cc: "
tsvwg <at> ietf.org" <
tsvwg <at> ietf.org>
> Sent: Friday, June 1, 2012 12:13 AM
> Subject: Re: [tsvwg] When SCTP server is restarted, DATA chunks start to get dropped or lost after sometime at SCTP client’s end.
>
> Chinmaya,
>
> Please forgive me if this is too basic and obvious a question.
>
> Are you sure that your "application" (SCTP-user) at the client
> side is consuming the incoming data AT LEAST AS FAST as it arrives?
>
> If not, then you will see something like the behaviour you
> describe. With thanks, from Chris
>
>
On Thu, 31 May 2012, Chinmaya Dwibedy wrote:
>
> >> Date: Thu, 31 May 2012 11:35:17 -0700 (PDT)
> >> From: Chinmaya Dwibedy <
ckdwibedy <at> yahoo.com>
> >> To: "
tsvwg <at> ietf.org" <
tsvwg <at> ietf.org>
> >> Subject: [tsvwg] When SCTP server is restarted,
> >> DATA chunks start to get dropped or lost after sometime at SCTP client
> >> end.
> >>
> >> Hi,
> We have SCTP client program (C++) running under RHEL4 (using kernel version: 2.6.9-55ELsmp and lksctp-tools-1.0.8-1). We are using one-to-one style socket using connect () system call to setup an association
with a SCTP server (i.e., SUT of vendor). Then it calls sctp_sendmsg () library function to send a message from a socket while using the following advanced features of SCTP (i.e., SCTP_UNORDERED for un-ordered delivery of the message) and timetolive (TTL) is set to 1000 milliseconds for a given message. We have typical defaults values for these RTO.Initial = 3000, RTO.Min = 1000, RTO.Max = 60000 milliseconds.The PR-SCTP extension is enabled in the kernel (by default).
> We are using lot (4000) SCTP associations on a single system, all are in ESTABLISHED state and data communication goes on fine without any drop/loss of message. But when the peer (i.e., SCTP server) gets restarted,
> a) All the SCTP clients receive INIT messages (with same address to the association) from SCTP server. The INIT ACK is being sent with the new InitTag and Verification Tag set to the InitTag received in INIT for each association.
> b)
The four-way handshake gets completed. I believe the existing association, including its current state, and the corresponding TCB does not get changed.
> c) Afterward, all SCTP clients use DATA chunks to exchange information with SCTP server.
> The trouble is that, after sometime (30-45 minutes), DATA chunks start to get dropped or lost and that goes on increasing as the time advances.The data was not transmitted to the peer at least once. Note that, the time after which this problem is being encountered is also not consistent. I mean to say, sometime we noticed after 30 minutes, 45 minutes and an hour. But when we lessen the number of SCTP associations to 2000, such problem does not appear.
>
>
> a) Is there any chance that, that data messages time-out within the RTO although both the values are same? As a result, the data will be skipped and no longer transmitted.
>
b) Does it indicate that the message could never be transmitted to the peer (e.g., flow control prevented the message from being sent before its lifetime expired), so the peer never received it? Please note that, we are using 1 Gig Intel NIC.
> The preliminary analysis says that, timetolive (in sctp_sendmsg() API) can be set on a message independent of Partial Reliability.The timer (i.e., sinfo_timetolive) only runs while the message is in the kernel send buffer and has not yet been put on the wire (for the first time). As soon as the message is put on the wire the first time, the timer is dropped. It is used to indicate that a time based lifetime is being applied to the data. It is then a number of milliseconds for which the data is attempted to be transmitted. If that many milliseconds elapsed and the data has not been transmitted, As a result, they move the abandoned list and are never retransmitted. Please feel free
to correct me if I am wrong.
> Should we try with 0 ms timetolive which indicates no timeout should occur on this message and is treated as reliable? Please suggest and also let me know if you need any additional information.
> Thanking you in advance for your response.
> Regards,
> Chinmaya
>
-----
Randall Stewart
randall <at> lakerest.net