Ying Xue | 18 Apr 14:04 2014

[PATCH net-next 0/3] fix potential deadlocks when ports are scheduled/waken up

The order of taking port_lock_list lock and port lock is reverse
while ports are scheduled, causing deadlock. Meanwhile, there still
two special cases resulting in deadlock when ports are waken up as we
cannot correctly cope with the relationship between node lock and
port/socket lock. In the series we fix the two issues and convert
tipc_ports list to RCU list as well.

Ying Xue (3):
  tipc: fix a potential deadlock when port is scheduled
  tipc: fix a potential deadlock when port is waken up
  tipc: convert tipc_ports list to RCU list

 net/tipc/link.c |   65 ++++++++++++++++++++++++++++++++++---------------------
 net/tipc/port.c |   44 +++++++++++++++++++++++++------------
 net/tipc/port.h |   10 ++-------
 net/tipc/ref.c  |   13 +++++++++++
 net/tipc/ref.h  |    1 +
 5 files changed, 86 insertions(+), 47 deletions(-)

--

-- 
1.7.9.5

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
Ying Xue | 18 Apr 09:18 2014

[PATCH net-next v2] tipc: fix race in disc create/delete

Commit a21a584d6720ce349b05795b9bcfab3de8e58419 (tipc: fix neighbor
detection problem after hw address change) introduces a race condition
involving tipc_disc_delete() and tipc_disc_add/remove_dest that can
cause TIPC to dereference the pointer to the bearer discovery request
structure after it has been freed since a stray pointer is left in the
bearer structure.

In order to fix the issue, the process of resetting the discovery
request handler is optimized: the discovery request handler and request
buffer are just reset instead of being freed, allocated and initialized.
As the request point is always valid and the request's lock is taken
while the request handler is reset, the race doesn't happen any more.

Reported-by: Erik Hugne <erik.hugne <at> ericsson.com>
Signed-off-by: Ying Xue <ying.xue <at> windriver.com>
Reviewed-by: Erik Hugne <erik.hugne <at> ericsson.com>
Tested-by: Erik Hugne <erik.hugne <at> ericsson.com>
---
 v2:
   - update patch description with longer commit id
   - remove dest parameter from tipc_disc_reset()

 net/tipc/bearer.c   |    3 +--
 net/tipc/discover.c |   53 ++++++++++++++++++++++++++++++++++-----------------
 net/tipc/discover.h |    1 +
 3 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 3abd970..f3259d4 100644
--- a/net/tipc/bearer.c
(Continue reading)

Ying Xue | 18 Apr 08:05 2014

[PATCH net-next v2 00/10] purge tipc_net_lock

Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
  RTNL lock is used on update side, and RCU is used on read side.
  Meanwhile, all bearer instances including broadcast bearer are
  saved into bearer_list array.

- Node and link level
  All node instances are saved into two tipc_node_list and node_htable
  lists. The two lists are protected by node_list_lock on write side,
  and they are guarded with RCU lock on read side. All members in node
  structure including link instances are protected by node spin lock.

- The relationship between bearer and node
  When link accesses bearer, it first needs to find the bearer with
  its bearer identity from the bearer_list array. When bearer accesses
  node, it can iterate the node_htable hash list with the node address
  to find the corresponding node.

In the new locking policy, every component has its private locking
(Continue reading)

Ying Xue | 8 Apr 10:17 2014

[PATCH] tipc: fix race in disc create/delete

Commit a21a58 (tipc: fix neighbor detection problem after hw address
change) introduces a race condition involving tipc_disc_delete() and
tipc_disc_add/remove_dest that can cause TIPC to dereference the
pointer to the bearer discovery request structure after it has been
freed since a stray pointer is left in the bearer structure.

In order to fix the issue, the process of resetting the discovery
request handler is optimized: the discovery request handler and request
buffer are just reset instead of being freed, allocated and initialized.
As the request point is always valid and the request's lock is taken
while the request handler is reset, the race doesn't happen any more.

Reported-by: Erik Hugne <erik.hugne <at> ericsson.com>
Signed-off-by: Ying Xue <ying.xue <at> windriver.com>
---
 net/tipc/bearer.c   |    3 +--
 net/tipc/discover.c |   55 ++++++++++++++++++++++++++++++++++-----------------
 net/tipc/discover.h |    1 +
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 3abd970..4e4da7b 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
 <at>  <at>  -365,9 +365,8  <at>  <at>  restart:
 static int tipc_reset_bearer(struct tipc_bearer *b_ptr)
 {
 	pr_info("Resetting bearer <%s>\n", b_ptr->name);
-	tipc_disc_delete(b_ptr->link_req);
 	tipc_link_reset_list(b_ptr->identity);
(Continue reading)

erik.hugne | 4 Apr 13:21 2014
Picon

[RFC v4 0/3] tipc: link state subscriptions

From: Erik Hugne <erik.hugne <at> ericsson.com>

v4:
 -Revised commit message for all patches, and added a 
  description for the race resolved by patch #1
 -Instead of tipc_name_seq, peer should be a __u32 in struct tipc_sioc_ln_req
 -Renamed TIPC_LINK_SRV to TIPC_LINK_STATE
 -Changed pointer names l_ptr/n_ptr to link/node

Updated client program: https://gist.github.com/Hugne/9936175

v3: 
link publications are published/withdrawn directly from bh
context in tipc_node_link_up/down. There was never any need to 
defer this processing. The computational overhead added by the
nametable operations during link up/down was enough for a race
condition between disc create/delete. (the oops mentioned earlier)
This issue is resolved with patch #1.

A single subscription to publications of type TIPC_LINK_SRV will now
generate events for all network planes. The bearer identity is
included in the event.port.ref.

I've added a tipc_ prefix to the ioctl request struct name, and
a new u32 member for the bearer identity.

Erik Hugne (3):
  tipc: fix race in disc create/delete
  tipc: add support for link state subscriptions
  tipc: add ioctl to fetch link names
(Continue reading)

Ying Xue | 4 Apr 11:55 2014

[PATCH net-next 00/10] purge tipc_net_lock

Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes extremely complex. In the worst case, such
locking policy not only has a negative influence on performance, but
also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
  RTNL lock is used on update side, and RCU is used on read side.
  Meanwhile, all bearer instances including broadcast bearer are
  saved into bearer_list array.

- Node and link level
  All node instances are saved into two tipc_node_list and node_htable
  lists. The two lists are protected by node_list_lock on write side,
  and they are guarded with RCU lock on read side. All members in node
  structure including link instances are protected by node spin lock.

- The relationship between bearer and node
  When link accesses bearer, it first needs to find the bearer with
  its bearer identity from the bearer_list array. When bearer accesses
  node, it can iterate the node_htable hash list with the node address
  to find the corresponding node.

In the new locking policy, every component has its private locking
(Continue reading)

erik.hugne | 3 Apr 17:01 2014
Picon

[RFC v3 0/3] tipc: link state subscriptions

From: Erik Hugne <erik.hugne <at> ericsson.com>

v3: link publications are published/withdrawn directly from bh
context in tipc_node_link_up/down. There was never any need to 
defer this processing. The computational overhead added by the
nametable operations during link up/down was enough for a race
condition between disc create/delete. (the oops mentioned earlier)
This issue is resolved with patch #1.

A single subscription to publications of type TIPC_LINK_SRV will now
generate events for all network planes. The bearer identity is
included in the event.port.ref.

I've added a tipc_ prefix to the ioctl request struct name, and
a new u32 member for the bearer identity.

Updated client test program to reflect these changes:
https://gist.github.com/Hugne/9936175

Erik Hugne (3):
  tipc: fix race in disc create/delete
  tipc: add support for link state subscriptions
  tipc: add ioctl to fetch link names

 include/uapi/linux/tipc.h | 10 ++++++++++
 net/tipc/discover.c       |  5 +++++
 net/tipc/node.c           | 34 +++++++++++++++++++++++++++++++++-
 net/tipc/node.h           |  1 +
 net/tipc/socket.c         | 29 ++++++++++++++++++++++++++---
 5 files changed, 75 insertions(+), 4 deletions(-)
(Continue reading)

Jon Maloy | 3 Apr 02:18 2014
Picon

[RFC: 00/21] tipc: changes in send and recv code paths

This commit series introduces some significant changes to the code path
for message sending and reception, both for unicast and multicast
transmssion.

The changes can be summarized as follows:

1) The socket (tipc_sock/socket.c) is now the endpoint of all user data
   transport. The port (tipc_port/port.c) has been reduced to become an
   internal support structure for the socket, taking care of connection
   supervision and the traditional message based flow control. We may
   remove it completely in the future.

2) There is now only one code path through the link when sending
   messages. This implemented by the function tipc_link_xmit(), and its
   internal helper function __tipc_link_xmit(). All other send functions
   in the link are gone.

3) The new tipc_link_xmit() takes a pre-fragmented buffer chain as
   input, and only needs to add it to the link's send queue before
   sending as much of it as it can to the media.

4) The new tipc_link_xmit() also makes the selection between node local
   or node external delivery. This is done in one single place, and
   all the previous selection points in port.c and net.c are gone.

5) There is a new, re-entrant function tipc_msg_build(), that takes
   a data iovec and the destination link MTU as input, and then
   directly, in one loop, copies the data into as many fragment buffers
   as neded, at the same time building the message and fragment headers.

(Continue reading)

erik.hugne | 2 Apr 17:22 2014
Picon

[RFC 0/2] tipc: link state subscriptions

From: Erik Hugne <erik.hugne <at> ericsson.com>

I consider this a major improvement over the previous version.
If we don't pack the actual link name in the link up/down events,
we obviously need to fetch them from somewhere else.

Doing this with a general TIPC ioctl seemed to be the easiest way, and
even though the implementation is a little hacky at the moment, it does
the job..

What goes on in the demo below is that the client issues a plane A+B
subscription to the topology server. When it receives a publish/withdraw
event, an SIOCGETLINKNAME ioctl request (introduced in patch #2) is done
on topology subscription socket, asking for the link name for the
publish (link up) or withdraw (link down) event.

node1 ~ # /tmp/client_tipc 
TIPC Topology subscriber started
Client: connected to topology server
Client: issued subscription to {18888,0,100}, timeout 500000, user ref: 2
Client: issued subscription to {0,0,4294967295}, timeout 4294967295, user ref: 3
Client: issued subscription to {10,0,4294967295}, timeout 4294967295, user ref: 4
Client: issued subscription to {11,0,4294967295}, timeout 4294967295, user ref: 5
Client: subscriptions remain active until client is killed
Client: received event for published {10,16781314,16781314} port id <1001001:0>
associated with network node 1001001
generated by subscription to {10,0,4294967295}, timeout 4294967295, user ref: 4
Link: <1.1.1:eth0-1.1.2:eth0> is UP
Client: received event for withdrawn {10,16781314,16781314} port id <1001001:0>
associated with network node 1001001
(Continue reading)

Jon Maloy | 3 Apr 03:12 2014

Loopback link

Ying,
I know you have a different view about the loopback link than what I present in the patch series
I just sent out, so I feel I need justify my statements there.

First, we need to keep in mind that TIPC first and foremost is an IPC service, not a data transport
service. It is nice if we can beat TCP on its home turf, which is massive data transfer (and we will),
but this is still not the main purpose of TIPC.

So, it is not sufficient to be somewhat better than TCP at node local message delivery. We should
be *much* better, especially regarding latency. And better than UNIX sockets, System V IPC and
probably a few others of the kind that are around.

The real truth about the performance through the loopback link is not found by comparing with
TCP, but by running the TIPC benchmark with and without the loopback link.

I did that, and could confirm what I suspected: the performance degradation by going  through
the loopback interface is dramatic.

Look at the figures below, run on 4-core machine (VM). The first series is without loopback, the
second one with.

In summary, you will see that latency times typically are ~50% longer when using loopback,
while throughput is 2-3 times lower.

And it gets worse: Because of the transport bottleneck that the loopback link/interface in
practice is introducing, throughput is not likely to increase with number of cores.  In contrast,
if we let all the cores play in parallel with direct socket to socket delivery as we do now, I am
pretty sure we will see throughput scale close to linearly with number of cores.

Also, the reasons you have stated about code complexity and locking problems are are
(Continue reading)

erik.hugne | 1 Apr 18:00 2014
Picon

[PATCH] tipc: fix regression bug where node events are not being generated

From: Erik Hugne <erik.hugne <at> ericsson.com>

Commit 5902385a2440a55f005b266c93e0bb9398e5a62b ("tipc: obsolete
the remote management feature") introduces a regression where node
topology events are not being generated because the publication
that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available.
This will break applications that rely on node events to discover
when nodes join/leave a cluster.

We fix this by advertising the node publication when TIPC enters
networking mode, and withdraws it upon shutdown.

Signed-off-by: Erik Hugne <erik.hugne <at> ericsson.com>
---
Patch is based on net-next, as the commit that introduces the
regression, 5902385 was only recently applied to that tree.

 net/tipc/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/tipc/net.c b/net/tipc/net.c
index 0374a81..4c564eb 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
 <at>  <at>  -182,6 +182,8  <at>  <at>  void tipc_net_start(u32 addr)
 	tipc_bclink_init();
 	write_unlock_bh(&tipc_net_lock);

+	tipc_nametbl_publish(TIPC_CFG_SRV, tipc_own_addr, tipc_own_addr,
+			     TIPC_ZONE_SCOPE, 0, tipc_own_addr);
(Continue reading)


Gmane