O'Brien, Chris | 21 Oct 15:52 2014

Fragmented multicast SOCK_DGRAM and retransmission


I have a scenario where one of our nodes is rate limiting broadcast packets and drops around 50% of the
packets. This causes havoc with reliable multicast SOCK_RDM and we end up getting congestion on the
sender. I assumed I could move to an unreliable transmission socket SOCK_DGRAM to bypass this node until
we address the rate limiting.

Even when I configure the socket for unreliable connectionless I still see the congestion and
retransmission occurring.

I was scanning though historical messages that appear to be related dated back to June 2007. http://sourceforge.net/p/tipc/mailman/message/8455616/

I am using fragmented messages. I would like to continue to send messages and the misbehaving node can be
flagged by the application and dealt with without affecting all the other nodes. My kernels are based off
of 3.4.34/2.6.27 and I am in a mixed 2.0/1.7 environment. Any suggestions how I can work around this node?


The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Coriant-Tellabs
(Continue reading)

Ying Xue | 20 Oct 09:17 2014

[PATCH net-next 00/10] standardize SKB queue operations

On TIPC packet transmission path, link layer maintains a single linked
list as its outbound queue with a sk_buff struct pointer recording the
queue's head. This requires when packets built up as a doubly linked
list are sent out through link, the doubly linked list has to be
converted a single linked list before it's jointed in the link outbound
queue. But the conversion depends on an assumption that sk_buffs have
the next and prev pointers at the beginning of the struct. However, not
only this assumption might not be true, but also it may prevent some
improvements for networking generic SKB management list.

To respond to this request, we decide that all SKB queues in the entire
TIPC stack are now managed with standard SKB list APIs associated with
struct sk_buff_head, having all relevant code more clean. But before
doing it, we need to clean up below redundant functionalities:

- remove node subscribe infrastructure
- remove protocol message queue
- remove retransmission queue
- clean up process of pushing packets in link layer

After that, below SKB queues are managed with standard SKB list APIs:

- link outqueue
- link deferred queue
- link receive queue
- link transmission queue

The series is based on the latest Richard's patchset of netlink.

Ying Xue (10):
(Continue reading)

erik.hugne | 17 Oct 08:45 2014

[PATCH] tipc: reduce amount of duplicate nak messages

From: Erik Hugne <erik.hugne <at> ericsson.com>

If an out of sequence packet is received and it is not a duplicate,
it will be placed on the deferred queue and the peer notified that
there is a sequence gap. However, if the received packet fills a
hole in, or is placed at the end of the defer queue, a duplicate NAK
message will be sent out. This in turn will generate duplicate
retransmissions and degraded link performance.
We fix this by only sending out a new NAK when the received packet
is placed a the head of the defer queue.

Signed-off-by: Erik Hugne <erik.hugne <at> ericsson.com>

This should be applied on top of Ying's skblist patchset.

 net/tipc/link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index a74be61..45f94b7 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
 <at>  <at>  -1395,7 +1395,7  <at>  <at>  static void link_handle_out_of_seq_msg(struct tipc_link *l_ptr,
 	if (tipc_link_defer_pkt(&l_ptr->deferred_queue, buf)) {
 		TIPC_SKB_CB(buf)->deferred = true;
-		if ((skb_queue_len(&l_ptr->deferred_queue) % 16) == 1)
+		if (skb_peek(&l_ptr->deferred_queue) == buf)
 			tipc_link_proto_xmit(l_ptr, STATE_MSG, 0, 0, 0, 0, 0);
(Continue reading)

Matthew Clark | 16 Oct 19:13 2014

TIPC 2.0.0 packets not being transmitted

I'm trying to get a TIPC cluster using a variety of ARM based processors,
but I'm having issues with some zc706 Zynq boards running a Yocto-built
kernel 3.14.2. Some see its neighbors perfect well, but the zc706 boards
can't be seen by anyone and think everyone else is down. I ran wireshark
and from what I can tell, the ZC706 boards simply aren't broadcasting any
packets. I see TIPC packets flying around from the overos and zedboard, but
nothing from the zynqs.

Can anyone help me debug this? I'm at a bit of a loss to explain the
behavior. Thanks!



Linux overo 3.5.7 #1 PREEMPT Tue Mar 11 09:06:14 EDT 2014 armv7l GNU/Linux

Linux zedboard 3.8.0-xilinx #4 SMP PREEMPT Thu Jul 10 15:13:36 EDT 2014
armv7l GNU/Linux

Linux zc706 3.14.2-xilinx #2 SMP PREEMPT Thu Oct 2 14:53:07 EDT 2014 armv7l
(Continue reading)

Ying Xue | 15 Oct 09:27 2014

[PATCH] tipc: fix lockdep warning when intra-node messages are delivered

When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
(Continue reading)

Erik Hugne | 15 Oct 08:54 2014

Problem with wrapping sequence numbers

Way back, I posted some early results/code that adds GRO capabilities to TIPC.
This have little to no impact for 1G links (slightly less CPU overhead for
packet processing), but for 10G and above, we saw a throughput increase of up
to 30%.
The first draft only worked on SOCK_STREAM data, but after Jon moved all
message fragmentation down to the TIPC link layer (thanks for that!) i rewrote
it to work on MSG_FRAGMENTER type packets. The GRO code leverages the fact that
fragments are sent back-to-back, and usually arrive in-order with minimal loss.
Each link will at any time have zero or one active flow, and a flow is flushed
either when we receive a lastfrag, the received fragment ID (or "long msgno")
does not match the expected one, or when the device layer feels like it.

During testing of this, I was simulating varying amounts of packet loss and
reordering when I noticed that if a packet have been deferred, and the link
sequence numbers wrap around before this packet is flushed back into the receive
chain the sequence number gap calculation goes bananas. I believe this is a
problem in the baseline (without GRO) aswell.

This small patch to the tipc_link_proto_xmit() STATE_MSG handling seems to
correct the issue:
                        u32 rec = buf_seqno(skb_peek(&l_ptr->deferred_queue));
-                       gap = mod(rec - mod(l_ptr->next_in_no));
+                       if (rec > mod(l_ptr->next_in_no))
+                               gap = mod(rec - mod(l_ptr->next_in_no));
+                       else
+                               gap = mod(mod(l_ptr->next_in_no) - rec);

If anyone confirms this, we should probably post a correction to -net asap.
Otherwise i will include it along with the complete GRO set.

(Continue reading)

Ying Xue | 15 Oct 04:14 2014

[PATCH next-net v3] rps: support tipc

TIPC protocol introduces an abstract link layer which provides a
reliable message delivery mechanism for connection-based,
connectionless, as well as multicast messages, so this means that
message sequence control and retransmission are conducted on link
layer instead of common transport layer like TCP or SCTP doing.
Therefore, the target CPU for RPS is determined through calculating
a flow hash over 2-tuple (i.e, previous and destination node address).

However, TIPC message header has several different formats, so we
have to consider them separately:

- Multicast message. In multicast message header, previous node
  address is varied, but destination node address is always 0. To
  make RPS deem all multicast messages as one flow to steer them to
  one CPU, 0xffffffff and 0 are assumed as their previous and
  destination node address respectively.

- Unicast message. Unicast message has two different types:
  connection-oriented and connectionless message. The former contains
  valid source node address, but doesn't include destination node
  address; the latter contains both. To ensure all unicast messages
  with the same destinations to steer one target CPU, source node
  address and 0 are deemed as flow source and destination addresses

Signed-off-by: Ying Xue <ying.xue <at> windriver.com>
 net/core/flow_dissector.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

(Continue reading)

Jon Maloy | 10 Oct 21:31 2014

[RFC: 0/6] tipc: resolve message disordering at reception

When TIPC is receiving messages from multi-threaded devices it may
occasionally deliver messages to their destination sockets in the
wrong order. This happens depite correct resequencing at the link
layer, because the upcall path from link to socket is done with
no locks held.

The commits in this series solve the problem by introducing an 
'input' message queue in each link, though which messages must
be delivered to the upper layers.

Jon Maloy (5):
  tipc: resolve race problem during message reception
  tipc: eliminate socket wakeup message queue from node struct
  tipc: simplify connection abort notifications when links break
  tipc: simplify socket multicast reception
  tipc: eliminate race condition at multicast msg reception

Ying Xue (1):
  tipc: fix a potential deadlock

 net/tipc/bcast.c      |  99 +++++++++-------------------
 net/tipc/bcast.h      |  18 ------
 net/tipc/core.h       |   1 +
 net/tipc/link.c       | 174 +++++++++++++++++++++++++++-----------------------
 net/tipc/link.h       |   6 +-
 net/tipc/msg.c        |  33 ++++++++++
 net/tipc/msg.h        |   2 +-
 net/tipc/name_distr.c |  29 +++++----
 net/tipc/name_distr.h |   2 +-
 net/tipc/name_table.c |  42 +++++++++++-
(Continue reading)

richard.alpe | 10 Oct 14:38 2014

[PATCH v6 00/14] tipc: new netlink API

From: Richard Alpe <richard.alpe <at> ericsson.com>

net-next is closed, here is what I plan to send in once it opens.

The old API is not removed.

The new API is separated from the old because of a bug in the old
tipc-config utility using it. When adding commands to the existing
genl_ops struct the get-family response message grows to a point where
it overflows the small receive buffer in tipc-config, subsequently
breaking the tool. Hence the two genl_family and genl_ops structs.

Redesigned "socket list command" to address David Millers comments in
net-next v1 of this patchset.

Simply put the problem is that we can have an arbitrary amount of
sockets with an arbitrary amount of associated publications. In the
previous patchset this was solved by nesting as many publications as
possible into a socket. If all didn't fit it sent the same socket again
with the remaining publications. As David Miller pointed out this makes
each message malformed as the receiver cannot by the data itself know if
it has received a complete set or not. This was flagged outside of the
data and the client did the reassembly.

o socket 1
  o publ 1
  o publ 2
(Continue reading)

Richard Alpe | 8 Oct 14:02 2014

Re: [PATCH v2 net-next 15/15] tipc: remove old ASCII netlink API

On 10/06/2014 11:47 PM, Jon Paul Maloy wrote:
> I sort of expected that answer. Just resend the other ones so we get
> them in now (we are at rc7+). We can try to figure out if we can do a
> kernel-internal translation later.
> ///jon
Alright, will, however I want to split the tipc_config.h file and put 
the new API things in tipc_netlink.h. And ideally I would like to put 
deprecated warnings on the old API.

Another thing, and the reason for this mail.

Older versions of tipc-config has a very low limit on its receive buffer 
(256). This triggers a fatal sanity check when using tipc-config with 
the new API in place. The reason is that the response to the initial 
"get family" query is now bigger as it contains a list of supported 
operations which has grown.

As it is totally legit for the kernel API to be extended with more 
commands I think we can consider this a bug in tipc-config. However, 
it's worth noticing that this effectively break tipc-config if you only 
upgrade the kernel. What do you think?

(commit dd96c2f in tipc-utils fixes this)


> Sent from Yahoo Mail on Android
> <https://ca.overview.mail.yahoo.com/mobile/?.src=Android>
(Continue reading)

richard.alpe | 2 Oct 13:36 2014

[PATCH v5 00/15] tipc: new netlink API

From: Richard Alpe <richard.alpe <at> ericsson.com>

Redesigned "socket list command" to address David Millers comments in
net-next v1 of this patchset.

Simply put the problem is that we can have an arbitrary amount of
sockets with an arbitrary amount of associated publications. In the
previous patchset this was solved by nesting as many publications as
possible into a socket. If all didn't fit it sent the same socket again
with the remaining publications. As David Miller pointed out this makes
each message malformed as the receiver cannot by the data itself know if
it has received a complete set or not. This was flagged outside of the
data and the client did the reassemble.

o socket 1
  o publ 1
  o publ 2
o socket 1
  o publ 3
  o publ 4

In this patchset I have divided the socket listing and publication
listing to avoid having nested data of arbitrary size.

TIPC_NL_SOCK_GET now dumps all sockets with any nested connection
information. However it no longer include publication information,
only a HAS_PUBL flag to indicate whether the socket has publications or
not. To compliment this there is a new command TIPC_NL_PUBL_GET which
takes a socket as argument and dumps all publications associated with
(Continue reading)