O'Brien, Chris | 31 Oct 18:05 2014

soft lockup on bearer blocked

I know the bearer area is changing a lot lately, but I wanted to throw out an observation I ran into using TIPC
2.0 from kernel 3.10. Maybe someone else might find the info useful.

I ran into the below softlockup on our node while disabling the bearer. While digging I found a
controversial patch that doesn't appear to have made it into the load addressed the issue. The patch was
titled "[PATCH net-next v3 01/12] tipc: convert 'blocked' flag in struct tipc_bearer to atomic_t".
Applying the patch addresses the issue.


sh-4.2# ifconfig eth0 down

tipc: Blocking bearer <eth:eth0.16>

tipc: Lost link <1.1.25:eth0.16-1.1.16:eth0.16> on network plane B

tipc: Lost contact with <1.1.16>

tipc: Lost link <1.1.25:eth0.16-1.1.12:eth0.16> on network plane B

tipc: Lost contact with <1.1.12>

tipc: Lost link <1.1.25:eth0.16-1.1.13:eth0.16> on network plane B

tipc: Lost contact with <1.1.13>

tipc: Lost link <1.1.25:eth0.16-1.1.6:eth0.16> on network plane B

tipc: Lost contact with <1.1.6>

(Continue reading)

Ying Xue | 31 Oct 08:43 2014

[PATCH net-next 0/9] Convert name table read-write lock to RCU

Now TIPC name table is statically allocated and is protected with a
Read-Write lock. To enhance the performance of TIPC name table lookup,
we are going to involve RCU lock to protect the name table. Especially
after that, it's lockless while concurrently looking up name table on
read side. However, before the conversion happens, below changes must
be made firstly:

 - change allocation of name table from static way to dynamic way
 - fix several incorrect locking policy issues
 - lastly, convert the read-write lock to RCU

Ying Xue (9):
  tipc: remove size variable from publ_list struct
  tipc: make name table allocated dynamically
  tipc: ensure all name sequences are released when name table is
  tipc: ensure all name sequences protected with its lock
  tipc: any name table member must be protected under name table lock
  tipc: simplfy relationship between name table lock and node lock
  tipc: fix incorrect locking policy for publication list of socket
  tipc: remove unnecessary INIT_LIST_HEAD
  tipc: convert name table read-write lock to RCU

 include/linux/rculist.h |    9 ++
 net/tipc/name_distr.c   |   81 +++++-----------
 net/tipc/name_table.c   |  242 ++++++++++++++++++++++++++---------------------
 net/tipc/name_table.h   |   23 ++++-
 net/tipc/socket.c       |   52 ++++++----
 net/tipc/subscr.c       |    1 -
 6 files changed, 222 insertions(+), 186 deletions(-)
(Continue reading)

O'Brien, Chris | 21 Oct 15:52 2014

Fragmented multicast SOCK_DGRAM and retransmission


I have a scenario where one of our nodes is rate limiting broadcast packets and drops around 50% of the
packets. This causes havoc with reliable multicast SOCK_RDM and we end up getting congestion on the
sender. I assumed I could move to an unreliable transmission socket SOCK_DGRAM to bypass this node until
we address the rate limiting.

Even when I configure the socket for unreliable connectionless I still see the congestion and
retransmission occurring.

I was scanning though historical messages that appear to be related dated back to June 2007. http://sourceforge.net/p/tipc/mailman/message/8455616/

I am using fragmented messages. I would like to continue to send messages and the misbehaving node can be
flagged by the application and dealt with without affecting all the other nodes. My kernels are based off
of 3.4.34/2.6.27 and I am in a mixed 2.0/1.7 environment. Any suggestions how I can work around this node?


The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Coriant-Tellabs
(Continue reading)

Ying Xue | 20 Oct 09:17 2014

[PATCH net-next 00/10] standardize SKB queue operations

On TIPC packet transmission path, link layer maintains a single linked
list as its outbound queue with a sk_buff struct pointer recording the
queue's head. This requires when packets built up as a doubly linked
list are sent out through link, the doubly linked list has to be
converted a single linked list before it's jointed in the link outbound
queue. But the conversion depends on an assumption that sk_buffs have
the next and prev pointers at the beginning of the struct. However, not
only this assumption might not be true, but also it may prevent some
improvements for networking generic SKB management list.

To respond to this request, we decide that all SKB queues in the entire
TIPC stack are now managed with standard SKB list APIs associated with
struct sk_buff_head, having all relevant code more clean. But before
doing it, we need to clean up below redundant functionalities:

- remove node subscribe infrastructure
- remove protocol message queue
- remove retransmission queue
- clean up process of pushing packets in link layer

After that, below SKB queues are managed with standard SKB list APIs:

- link outqueue
- link deferred queue
- link receive queue
- link transmission queue

The series is based on the latest Richard's patchset of netlink.

Ying Xue (10):
(Continue reading)

erik.hugne | 17 Oct 08:45 2014

[PATCH] tipc: reduce amount of duplicate nak messages

From: Erik Hugne <erik.hugne <at> ericsson.com>

If an out of sequence packet is received and it is not a duplicate,
it will be placed on the deferred queue and the peer notified that
there is a sequence gap. However, if the received packet fills a
hole in, or is placed at the end of the defer queue, a duplicate NAK
message will be sent out. This in turn will generate duplicate
retransmissions and degraded link performance.
We fix this by only sending out a new NAK when the received packet
is placed a the head of the defer queue.

Signed-off-by: Erik Hugne <erik.hugne <at> ericsson.com>

This should be applied on top of Ying's skblist patchset.

 net/tipc/link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index a74be61..45f94b7 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
 <at>  <at>  -1395,7 +1395,7  <at>  <at>  static void link_handle_out_of_seq_msg(struct tipc_link *l_ptr,
 	if (tipc_link_defer_pkt(&l_ptr->deferred_queue, buf)) {
 		TIPC_SKB_CB(buf)->deferred = true;
-		if ((skb_queue_len(&l_ptr->deferred_queue) % 16) == 1)
+		if (skb_peek(&l_ptr->deferred_queue) == buf)
 			tipc_link_proto_xmit(l_ptr, STATE_MSG, 0, 0, 0, 0, 0);
(Continue reading)

Matthew Clark | 16 Oct 19:13 2014

TIPC 2.0.0 packets not being transmitted

I'm trying to get a TIPC cluster using a variety of ARM based processors,
but I'm having issues with some zc706 Zynq boards running a Yocto-built
kernel 3.14.2. Some see its neighbors perfect well, but the zc706 boards
can't be seen by anyone and think everyone else is down. I ran wireshark
and from what I can tell, the ZC706 boards simply aren't broadcasting any
packets. I see TIPC packets flying around from the overos and zedboard, but
nothing from the zynqs.

Can anyone help me debug this? I'm at a bit of a loss to explain the
behavior. Thanks!



Linux overo 3.5.7 #1 PREEMPT Tue Mar 11 09:06:14 EDT 2014 armv7l GNU/Linux

Linux zedboard 3.8.0-xilinx #4 SMP PREEMPT Thu Jul 10 15:13:36 EDT 2014
armv7l GNU/Linux

Linux zc706 3.14.2-xilinx #2 SMP PREEMPT Thu Oct 2 14:53:07 EDT 2014 armv7l
(Continue reading)

Ying Xue | 15 Oct 09:27 2014

[PATCH] tipc: fix lockdep warning when intra-node messages are delivered

When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
(Continue reading)

Erik Hugne | 15 Oct 08:54 2014

Problem with wrapping sequence numbers

Way back, I posted some early results/code that adds GRO capabilities to TIPC.
This have little to no impact for 1G links (slightly less CPU overhead for
packet processing), but for 10G and above, we saw a throughput increase of up
to 30%.
The first draft only worked on SOCK_STREAM data, but after Jon moved all
message fragmentation down to the TIPC link layer (thanks for that!) i rewrote
it to work on MSG_FRAGMENTER type packets. The GRO code leverages the fact that
fragments are sent back-to-back, and usually arrive in-order with minimal loss.
Each link will at any time have zero or one active flow, and a flow is flushed
either when we receive a lastfrag, the received fragment ID (or "long msgno")
does not match the expected one, or when the device layer feels like it.

During testing of this, I was simulating varying amounts of packet loss and
reordering when I noticed that if a packet have been deferred, and the link
sequence numbers wrap around before this packet is flushed back into the receive
chain the sequence number gap calculation goes bananas. I believe this is a
problem in the baseline (without GRO) aswell.

This small patch to the tipc_link_proto_xmit() STATE_MSG handling seems to
correct the issue:
                        u32 rec = buf_seqno(skb_peek(&l_ptr->deferred_queue));
-                       gap = mod(rec - mod(l_ptr->next_in_no));
+                       if (rec > mod(l_ptr->next_in_no))
+                               gap = mod(rec - mod(l_ptr->next_in_no));
+                       else
+                               gap = mod(mod(l_ptr->next_in_no) - rec);

If anyone confirms this, we should probably post a correction to -net asap.
Otherwise i will include it along with the complete GRO set.

(Continue reading)

Ying Xue | 15 Oct 04:14 2014

[PATCH next-net v3] rps: support tipc

TIPC protocol introduces an abstract link layer which provides a
reliable message delivery mechanism for connection-based,
connectionless, as well as multicast messages, so this means that
message sequence control and retransmission are conducted on link
layer instead of common transport layer like TCP or SCTP doing.
Therefore, the target CPU for RPS is determined through calculating
a flow hash over 2-tuple (i.e, previous and destination node address).

However, TIPC message header has several different formats, so we
have to consider them separately:

- Multicast message. In multicast message header, previous node
  address is varied, but destination node address is always 0. To
  make RPS deem all multicast messages as one flow to steer them to
  one CPU, 0xffffffff and 0 are assumed as their previous and
  destination node address respectively.

- Unicast message. Unicast message has two different types:
  connection-oriented and connectionless message. The former contains
  valid source node address, but doesn't include destination node
  address; the latter contains both. To ensure all unicast messages
  with the same destinations to steer one target CPU, source node
  address and 0 are deemed as flow source and destination addresses

Signed-off-by: Ying Xue <ying.xue <at> windriver.com>
 net/core/flow_dissector.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

(Continue reading)

Jon Maloy | 10 Oct 21:31 2014

[RFC: 0/6] tipc: resolve message disordering at reception

When TIPC is receiving messages from multi-threaded devices it may
occasionally deliver messages to their destination sockets in the
wrong order. This happens depite correct resequencing at the link
layer, because the upcall path from link to socket is done with
no locks held.

The commits in this series solve the problem by introducing an 
'input' message queue in each link, though which messages must
be delivered to the upper layers.

Jon Maloy (5):
  tipc: resolve race problem during message reception
  tipc: eliminate socket wakeup message queue from node struct
  tipc: simplify connection abort notifications when links break
  tipc: simplify socket multicast reception
  tipc: eliminate race condition at multicast msg reception

Ying Xue (1):
  tipc: fix a potential deadlock

 net/tipc/bcast.c      |  99 +++++++++-------------------
 net/tipc/bcast.h      |  18 ------
 net/tipc/core.h       |   1 +
 net/tipc/link.c       | 174 +++++++++++++++++++++++++++-----------------------
 net/tipc/link.h       |   6 +-
 net/tipc/msg.c        |  33 ++++++++++
 net/tipc/msg.h        |   2 +-
 net/tipc/name_distr.c |  29 +++++----
 net/tipc/name_distr.h |   2 +-
 net/tipc/name_table.c |  42 +++++++++++-
(Continue reading)

richard.alpe | 10 Oct 14:38 2014

[PATCH v6 00/14] tipc: new netlink API

From: Richard Alpe <richard.alpe <at> ericsson.com>

net-next is closed, here is what I plan to send in once it opens.

The old API is not removed.

The new API is separated from the old because of a bug in the old
tipc-config utility using it. When adding commands to the existing
genl_ops struct the get-family response message grows to a point where
it overflows the small receive buffer in tipc-config, subsequently
breaking the tool. Hence the two genl_family and genl_ops structs.

Redesigned "socket list command" to address David Millers comments in
net-next v1 of this patchset.

Simply put the problem is that we can have an arbitrary amount of
sockets with an arbitrary amount of associated publications. In the
previous patchset this was solved by nesting as many publications as
possible into a socket. If all didn't fit it sent the same socket again
with the remaining publications. As David Miller pointed out this makes
each message malformed as the receiver cannot by the data itself know if
it has received a complete set or not. This was flagged outside of the
data and the client did the reassembly.

o socket 1
  o publ 1
  o publ 2
(Continue reading)