Ying Xue | 22 Apr 13:30 2014

[RFC 0/5] purge signal handler infrastructure

When we delay some actions to be executed in another context
asynchronously, this will add complexities of both code and
locking policy. In addition, as we firstly stop the signal handler
when tipc module is removed, there may have some potential risks for
us. For instance, we still submit some signal requests to signal
handler infrastructure even if the latter is already stopped, which
may cause some resources to be incorrectly freed.

So this series aims to convert all actions being performed in tasklet
context asynchronously with interface provided by signal handler
infrastructure to be synchronously executed, and further delete
the whole infrastructure of signal handler.

Ying Xue (5):
  tipc: avoid to asynchronously notify subscriptions
  tipc: eliminate node_name_purge_complete routine
  tipc: avoid to asynchronously deliver name tables to peer node
  tipc: avoid to asynchronously reset all links
  tipc: purge signal handler infrastructure

 net/tipc/Makefile      |    2 +-
 net/tipc/bcast.c       |    2 +-
 net/tipc/bcast.h       |    1 +
 net/tipc/core.c        |    7 ---
 net/tipc/core.h        |    6 +--
 net/tipc/handler.c     |  134 ------------------------------------------------
 net/tipc/link.c        |   21 +++-----
 net/tipc/name_distr.c  |   52 +------------------
 net/tipc/name_distr.h  |   30 ++++++++++-
 net/tipc/node.c        |   46 ++++++++++-------
(Continue reading)

Ying Xue | 22 Apr 11:31 2014

[PATCH net-next v2 0/6] fix potential deadlocks when ports are scheduled/waken up

The order of taking port_lock_list lock and port lock is reverse
while ports are scheduled, causing deadlock. Meanwhile, there still
two special cases resulting in deadlock when ports are waken up as we
cannot correctly cope with the relationship between node lock and
port/socket lock. In the series we fix the two issues and convert
tipc_ports list to RCU list as well.

v2:
 - add several prepared patches
 - postpone port wakeup actions until node lock is released

Ying Xue (6):
  tipc: fix a potential deadlock when port is scheduled
  tipc: always use tipc_node_lock() to get node lock
  tipc: adjust order of variables in tipc_node structure
  tipc: rename setup_blocked variable of node struct to flags
  tipc: fix a potential deadlock when port is waken up
  tipc: convert tipc_ports list to RCU list

 net/tipc/bcast.c      |   21 ++++++++++
 net/tipc/bcast.h      |    2 +
 net/tipc/link.c       |  109 ++++++++++++++++++++++++++++++-------------------
 net/tipc/link.h       |   17 ++++++++
 net/tipc/name_distr.c |    6 +--
 net/tipc/node.c       |   32 +++++++++++++--
 net/tipc/node.h       |   91 +++++++++++++++++++++++------------------
 net/tipc/port.c       |   44 +++++++++++++-------
 net/tipc/port.h       |   10 +----
 net/tipc/ref.c        |   13 ++++++
 net/tipc/ref.h        |    1 +
(Continue reading)

Alex Jones | 21 Apr 20:49 2014

soft lockup when unloading kernel module

Hello,

     I am seeing a soft lockup when unloading the tipc kernel module 
when there are other nodes in the cluster.  If this is the only node in 
the cluster, I don't see a lockup when unloading.  I didn't see this in 
the bug list, so I thought I would post here.

     I'm using OpenSUSE 12.2 and a 3.4.47 kernel.  This is an Ivy Bridge 
Intel CPU with 20 cores.

     I've tried backporting TIPC from 3.12.17, but the problem is still 
there.

     Here is the trace:

done
[ 1215.706297] tipc: Disabling bearer <eth:bond0.250>
[ 1215.711077] tipc: Lost link <1.1.6:bond0.250-1.1.3:bond0.250> on 
network plane A
[ 1215.718449] tipc: Lost contact with <1.1.3>
[ 1215.722622] tipc: Lost link <1.1.6:bond0.250-1.1.4:bond0.250> on 
network plane A
[ 1215.729991] tipc: Lost contact with <1.1.4>
[ 1215.734164] tipc: Lost link <1.1.6:bond0.250-1.1.5:bond0.250> on 
network plane A
[ 1215.741533] tipc: Lost contact with <1.1.5>
[ 1215.745703] tipc: Lost link <1.1.6:bond0.250-1.1.1:bond0.250> on 
network plane A
[ 1215.753069] tipc: Lost contact with <1.1.1>
[ 1242.364457] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:6888]
(Continue reading)

Ying Xue | 21 Apr 04:55 2014

[PATCH net-next 00/11] purge tipc_net_lock

Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
  RTNL lock is used on update side, and RCU is used on read side.
  Meanwhile, all bearer instances including broadcast bearer are
  saved into bearer_list array.

- Node and link level
  All node instances are saved into two tipc_node_list and node_htable
  lists. The two lists are protected by node_list_lock on write side,
  and they are guarded with RCU lock on read side. All members in node
  structure including link instances are protected by node spin lock.

- The relationship between bearer and node
  When link accesses bearer, it first needs to find the bearer with
  its bearer identity from the bearer_list array. When bearer accesses
  node, it can iterate the node_htable hash list with the node address
  to find the corresponding node.

In the new locking policy, every component has its private locking
(Continue reading)

Ying Xue | 18 Apr 14:04 2014

[PATCH net-next 0/3] fix potential deadlocks when ports are scheduled/waken up

The order of taking port_lock_list lock and port lock is reverse
while ports are scheduled, causing deadlock. Meanwhile, there still
two special cases resulting in deadlock when ports are waken up as we
cannot correctly cope with the relationship between node lock and
port/socket lock. In the series we fix the two issues and convert
tipc_ports list to RCU list as well.

Ying Xue (3):
  tipc: fix a potential deadlock when port is scheduled
  tipc: fix a potential deadlock when port is waken up
  tipc: convert tipc_ports list to RCU list

 net/tipc/link.c |   65 ++++++++++++++++++++++++++++++++++---------------------
 net/tipc/port.c |   44 +++++++++++++++++++++++++------------
 net/tipc/port.h |   10 ++-------
 net/tipc/ref.c  |   13 +++++++++++
 net/tipc/ref.h  |    1 +
 5 files changed, 86 insertions(+), 47 deletions(-)

--

-- 
1.7.9.5

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
Ying Xue | 18 Apr 09:18 2014

[PATCH net-next v2] tipc: fix race in disc create/delete

Commit a21a584d6720ce349b05795b9bcfab3de8e58419 (tipc: fix neighbor
detection problem after hw address change) introduces a race condition
involving tipc_disc_delete() and tipc_disc_add/remove_dest that can
cause TIPC to dereference the pointer to the bearer discovery request
structure after it has been freed since a stray pointer is left in the
bearer structure.

In order to fix the issue, the process of resetting the discovery
request handler is optimized: the discovery request handler and request
buffer are just reset instead of being freed, allocated and initialized.
As the request point is always valid and the request's lock is taken
while the request handler is reset, the race doesn't happen any more.

Reported-by: Erik Hugne <erik.hugne <at> ericsson.com>
Signed-off-by: Ying Xue <ying.xue <at> windriver.com>
Reviewed-by: Erik Hugne <erik.hugne <at> ericsson.com>
Tested-by: Erik Hugne <erik.hugne <at> ericsson.com>
---
 v2:
   - update patch description with longer commit id
   - remove dest parameter from tipc_disc_reset()

 net/tipc/bearer.c   |    3 +--
 net/tipc/discover.c |   53 ++++++++++++++++++++++++++++++++++-----------------
 net/tipc/discover.h |    1 +
 3 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 3abd970..f3259d4 100644
--- a/net/tipc/bearer.c
(Continue reading)

Ying Xue | 18 Apr 08:05 2014

[PATCH net-next v2 00/10] purge tipc_net_lock

Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
  RTNL lock is used on update side, and RCU is used on read side.
  Meanwhile, all bearer instances including broadcast bearer are
  saved into bearer_list array.

- Node and link level
  All node instances are saved into two tipc_node_list and node_htable
  lists. The two lists are protected by node_list_lock on write side,
  and they are guarded with RCU lock on read side. All members in node
  structure including link instances are protected by node spin lock.

- The relationship between bearer and node
  When link accesses bearer, it first needs to find the bearer with
  its bearer identity from the bearer_list array. When bearer accesses
  node, it can iterate the node_htable hash list with the node address
  to find the corresponding node.

In the new locking policy, every component has its private locking
(Continue reading)

Ying Xue | 8 Apr 10:17 2014

[PATCH] tipc: fix race in disc create/delete

Commit a21a58 (tipc: fix neighbor detection problem after hw address
change) introduces a race condition involving tipc_disc_delete() and
tipc_disc_add/remove_dest that can cause TIPC to dereference the
pointer to the bearer discovery request structure after it has been
freed since a stray pointer is left in the bearer structure.

In order to fix the issue, the process of resetting the discovery
request handler is optimized: the discovery request handler and request
buffer are just reset instead of being freed, allocated and initialized.
As the request point is always valid and the request's lock is taken
while the request handler is reset, the race doesn't happen any more.

Reported-by: Erik Hugne <erik.hugne <at> ericsson.com>
Signed-off-by: Ying Xue <ying.xue <at> windriver.com>
---
 net/tipc/bearer.c   |    3 +--
 net/tipc/discover.c |   55 ++++++++++++++++++++++++++++++++++-----------------
 net/tipc/discover.h |    1 +
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 3abd970..4e4da7b 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
 <at>  <at>  -365,9 +365,8  <at>  <at>  restart:
 static int tipc_reset_bearer(struct tipc_bearer *b_ptr)
 {
 	pr_info("Resetting bearer <%s>\n", b_ptr->name);
-	tipc_disc_delete(b_ptr->link_req);
 	tipc_link_reset_list(b_ptr->identity);
(Continue reading)

erik.hugne | 4 Apr 13:21 2014
Picon

[RFC v4 0/3] tipc: link state subscriptions

From: Erik Hugne <erik.hugne <at> ericsson.com>

v4:
 -Revised commit message for all patches, and added a 
  description for the race resolved by patch #1
 -Instead of tipc_name_seq, peer should be a __u32 in struct tipc_sioc_ln_req
 -Renamed TIPC_LINK_SRV to TIPC_LINK_STATE
 -Changed pointer names l_ptr/n_ptr to link/node

Updated client program: https://gist.github.com/Hugne/9936175

v3: 
link publications are published/withdrawn directly from bh
context in tipc_node_link_up/down. There was never any need to 
defer this processing. The computational overhead added by the
nametable operations during link up/down was enough for a race
condition between disc create/delete. (the oops mentioned earlier)
This issue is resolved with patch #1.

A single subscription to publications of type TIPC_LINK_SRV will now
generate events for all network planes. The bearer identity is
included in the event.port.ref.

I've added a tipc_ prefix to the ioctl request struct name, and
a new u32 member for the bearer identity.

Erik Hugne (3):
  tipc: fix race in disc create/delete
  tipc: add support for link state subscriptions
  tipc: add ioctl to fetch link names
(Continue reading)

Ying Xue | 4 Apr 11:55 2014

[PATCH net-next 00/10] purge tipc_net_lock

Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes extremely complex. In the worst case, such
locking policy not only has a negative influence on performance, but
also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
  RTNL lock is used on update side, and RCU is used on read side.
  Meanwhile, all bearer instances including broadcast bearer are
  saved into bearer_list array.

- Node and link level
  All node instances are saved into two tipc_node_list and node_htable
  lists. The two lists are protected by node_list_lock on write side,
  and they are guarded with RCU lock on read side. All members in node
  structure including link instances are protected by node spin lock.

- The relationship between bearer and node
  When link accesses bearer, it first needs to find the bearer with
  its bearer identity from the bearer_list array. When bearer accesses
  node, it can iterate the node_htable hash list with the node address
  to find the corresponding node.

In the new locking policy, every component has its private locking
(Continue reading)

erik.hugne | 3 Apr 17:01 2014
Picon

[RFC v3 0/3] tipc: link state subscriptions

From: Erik Hugne <erik.hugne <at> ericsson.com>

v3: link publications are published/withdrawn directly from bh
context in tipc_node_link_up/down. There was never any need to 
defer this processing. The computational overhead added by the
nametable operations during link up/down was enough for a race
condition between disc create/delete. (the oops mentioned earlier)
This issue is resolved with patch #1.

A single subscription to publications of type TIPC_LINK_SRV will now
generate events for all network planes. The bearer identity is
included in the event.port.ref.

I've added a tipc_ prefix to the ioctl request struct name, and
a new u32 member for the bearer identity.

Updated client test program to reflect these changes:
https://gist.github.com/Hugne/9936175

Erik Hugne (3):
  tipc: fix race in disc create/delete
  tipc: add support for link state subscriptions
  tipc: add ioctl to fetch link names

 include/uapi/linux/tipc.h | 10 ++++++++++
 net/tipc/discover.c       |  5 +++++
 net/tipc/node.c           | 34 +++++++++++++++++++++++++++++++++-
 net/tipc/node.h           |  1 +
 net/tipc/socket.c         | 29 ++++++++++++++++++++++++++---
 5 files changed, 75 insertions(+), 4 deletions(-)
(Continue reading)


Gmane