Erik Hugne | 3 May 07:08 2016
Picon

Re: tipc: tipc_recv_stream with kernel panic

(On mobile)

At first glance, it seems that the socket was freed, but there was a
pending wakeup signal for it. Which then causes the subsequent
spin_lock_bh() to deref freed mem.

//E

On May 3, 2016 02:43, "GUNA" <gbalasun <at> gmail.com> wrote
[...]
>> [375832.498126] BUG: unable to handle kernel paging request at
000001a400015ff4
> [375832.505300] IP: [<ffffffff810c3566>]
queued_spin_lock_slowpath+0xe6/0x160
> [375832.512394] PGD 0
> [375832.514657] Oops: 0002 [#1] SMP
> [375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4
> nf_log_common xt_LOG sctp libcrc32c e1000e tipc udp_tunnel
> ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev br_netfilter bridge stp
> llc nf_conntrack_ipv4 nf_defrag_ipv4 ipmiq_drv(O) sio_mmc(O)
> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> nf_conntrack lockd ip6table_filter event_drv(O) ip6_tables grace
> pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801
> iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr
> sunrpc ptp mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last
> unloaded: iTCO_wdt]
> [375832.573693] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           O
>  4.4.0 #14
> [375832.581385] Hardware name: PT AMC124/Base Board Product Name, BIOS
> LGNAJFIP.PTI.0012.P15 01/15/2014
(Continue reading)

GUNA | 3 May 02:42 2016
Picon

tipc: tipc_recv_stream with kernel panic

The following TIPC traces were collected after cards were forced to
reboot to recover them.
Kernel: 4.4.0 is running and applied some latest TIPC patches.

[   65.954959] sm-msp-queue[1279]: unable to qualify my own domain
name (dcsx5testslot3) -- using short name
[  632.098785] perf interrupt took too long (2505 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[ 5880.428123] perf interrupt took too long (5585 > 5000), lowering
kernel.perf_event_max_sample_rate to 25000
[17934.014969] CE: hpet increased min_delta_ns to 20115 nsec
[38956.721789] CE: hpet4 increased min_delta_ns to 20115 nsec
[46927.872827] hrtimer: interrupt took 63361 ns
[101662.241093] CE: hpet2 increased min_delta_ns to 20115 nsec
[245973.044600] CE: hpet6 increased min_delta_ns to 20115 nsec
[368639.565040] show_signal_msg: 6 callbacks suppressed
[375832.498126] BUG: unable to handle kernel paging request at 000001a400015ff4
[375832.505300] IP: [<ffffffff810c3566>] queued_spin_lock_slowpath+0xe6/0x160
[375832.512394] PGD 0
[375832.514657] Oops: 0002 [#1] SMP
[375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4
nf_log_common xt_LOG sctp libcrc32c e1000e tipc udp_tunnel
ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev br_netfilter bridge stp
llc nf_conntrack_ipv4 nf_defrag_ipv4 ipmiq_drv(O) sio_mmc(O)
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack lockd ip6table_filter event_drv(O) ip6_tables grace
pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801
iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr
sunrpc ptp mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last
unloaded: iTCO_wdt]
(Continue reading)

GUNA | 2 May 19:22 2016
Picon

Re: Tipc: name table mismatch between different cards in a system

Is there any possibility getting the fix soon? Our audit scripts cause
alarm due to incorrect table mismatch. If you point me the code to be
fixed then I will fix it in my kernel. I am using kernel 4.4.0 on
Fedora dist.
Thanks in advance.
Guna

On Fri, Apr 29, 2016 at 11:55 AM, Jon Maloy <jon.maloy <at> ericsson.com> wrote:
>
>
>> -----Original Message-----
>> From: GUNA [mailto:gbalasun <at> gmail.com]
>> Sent: Friday, 29 April, 2016 10:48
>> To: Jon Maloy
>> Cc: tipc-discussion <at> lists.sourceforge.net
>> Subject: Re: Tipc: name table mismatch between different cards in a system
>>
>> The two skb_linearize() calls and the update of ‘hdr' fixes are
>> already in my load did not solve this issue. The issue remains same
>> even after today's ACTIVE state fix (before one of link is STANDBY
>> even same priority)
>>
>> // IO card, note this does not run latest kernel or tipc
>> [root <at> 10 ~]# tipc-config -nt |grep 2334480598
>>            20012      20012      <1.1.12:2334480598>        2334480599  cluster
>>
>> // runs latest kernel on all CPU cards.
>> [root <at> 2 ~]# tipc-config -nt |grep 2334480598
>> 50009      20012      20012      <1.1.12:2334480598>        2334480598  cluster
>
(Continue reading)

Jon Maloy | 2 May 16:22 2016
Picon

[PATCH net-next 0/3] tipc: redesign socket-level flow control

The socket-level flow control in TIPC has long been due for a major
overhaul. This series fixes this.

Jon Maloy (3):
  tipc: re-enable compensation for socket receive buffer double counting
  tipc: propagate peer node capabilities to socket layer
  tipc: redesign connection-level flow control

 net/tipc/core.c   |   8 ++-
 net/tipc/msg.h    |  14 +++++-
 net/tipc/node.c   |  21 +++++++-
 net/tipc/node.h   |   6 ++-
 net/tipc/socket.c | 144 +++++++++++++++++++++++++++++++++++-------------------
 net/tipc/socket.h |  17 +++++--
 6 files changed, 145 insertions(+), 65 deletions(-)

--

-- 
1.9.1

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
Jon Maloy | 2 May 15:18 2016
Picon

Re: [PATCH] tipc: Only process unicast on intended node


> -----Original Message-----
> From: Hamish Martin [mailto:Hamish.Martin <at> alliedtelesis.co.nz]
> Sent: Sunday, 01 May, 2016 16:44
> To: Jon Maloy; Jon Maloy
> Cc: tipc-discussion <at> lists.sourceforge.net; Ying Xue; Xue Ying
> (ying.xue0 <at> gmail.com); Richard Alpe; Parthasarathy Bhuvaragan
> Subject: Re: [PATCH] tipc: Only process unicast on intended node
> 
> Hi Jon,
> 
> The broadcast transmitters were definitely the "steady" ones that
> weren't being rebooted. I can't rule out the other nodes having the
> issue too, but since it was so hard to find I followed the reproduction
> path I could see most easily. In that case it was a "steady" node that
> was transmitting to a new node and the "steady" node processed the ack
> from some other node as though it came from the new node.

Do you have a wireshark dump? The matching of mac-address/tipc-node would show where these messages
really came from, i.e. if it is dest_node or prev_node that is wrong.
Also, did you really start new machines after you had removed the old ones, or did you just re-use the same
machine/VM with a new tipc node id ?

> 
> Also, I found this on v4.4.6 so I recommend, given the seriousness of
> the symptoms, that it be put in the 4.4 and 4.5 branches too.

That is why I posted this to 'net'. Then, it will be applied as far back as possible, but also "forward" into
the ongoing 4.6 cycle in the near future.

(Continue reading)

GUNA | 29 Apr 22:55 2016
Picon

tipc utility: No tipc binary in iproute2

I have compiled on server as well as on target. Both cases, the "tipc"
utility is not built. The rest of the utilities are built fine.

Tried iproute2-4.4.0 and iproute2-4.5.0 versions

===
make clean
make
...
make[1]: Leaving directory `/root/iproute2-4.4.0/genl'
make[1]: Entering directory `/root/iproute2-4.4.0/tipc'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/root/iproute2-4.4.0/tipc'
make[1]: Entering directory `/root/iproute2-4.4.0/man'

Am I missing any?

Thank you,
Guna

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
GUNA | 29 Apr 21:32 2016
Picon

Re: Tipc: name table mismatch between different cards in a system

Thank you Jon.

Richard, could you let me know the fix for this please.

thanks,
Guna

On Fri, Apr 29, 2016 at 11:55 AM, Jon Maloy <jon.maloy <at> ericsson.com> wrote:
>
>
>> -----Original Message-----
>> From: GUNA [mailto:gbalasun <at> gmail.com]
>> Sent: Friday, 29 April, 2016 10:48
>> To: Jon Maloy
>> Cc: tipc-discussion <at> lists.sourceforge.net
>> Subject: Re: Tipc: name table mismatch between different cards in a system
>>
>> The two skb_linearize() calls and the update of ‘hdr' fixes are
>> already in my load did not solve this issue. The issue remains same
>> even after today's ACTIVE state fix (before one of link is STANDBY
>> even same priority)
>>
>> // IO card, note this does not run latest kernel or tipc
>> [root <at> 10 ~]# tipc-config -nt |grep 2334480598
>>            20012      20012      <1.1.12:2334480598>        2334480599  cluster
>>
>> // runs latest kernel on all CPU cards.
>> [root <at> 2 ~]# tipc-config -nt |grep 2334480598
>> 50009      20012      20012      <1.1.12:2334480598>        2334480598  cluster
>
(Continue reading)

GUNA | 28 Apr 16:27 2016
Picon

Kernel 4.4.0 TIPC: links were bouncing and not stable enough

Hi Jon,

Back to debugging the table mismatch and standby links issues ...

I need to clarify two items first as described below. The both issues are
reported by our audit script and works fine for kernel 3.4.2 but not for
new kernel 4.4.0

1. Table mismatch
This is due to bunch of entries with type 2, "node" scope that differs from
each other.
Since the type "2"  is internal and "node" scope, do we expect this to be
matched with other node's table? Any change on latest TIPC?

// slot3

2          16781314   16781314   <1.1.3:0>                  0           node
2          16781314   16781314   <1.1.3:1>                  1           node
2          16781324   16781324   <1.1.3:1>                  1           node
2          16781324   16781324   <1.1.3:0>                  0           node
2          16781325   16781325   <1.1.3:0>                  0           node
2          16781325   16781325   <1.1.3:1>                  1           node

// slot2
Type       Lower      Upper      Port Identity              Publication
Scope

2          16781315   16781315   <1.1.2:0>                  0           node
2          16781315   16781315   <1.1.2:1>                  1           node
2          16781324   16781324   <1.1.2:0>                  0           node
(Continue reading)

Jon Maloy | 27 Apr 16:26 2016
Picon

[PATCH net-next v4 1/1] tipc: add neighbor monitoring framework

TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.

This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.

This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:

- Each node makes up a linearly ascending, circular list of all its N
  known neighbors, based on their TIPC node identity. This algorithm
  must be the same on all nodes.

- The node then selects the next M = sqrt(N) - 1 nodes downstream from
  itself in the list, and chooses to actively monitor those. This is
  called its "local monitoring domain".

- It creates a domain record describing the monitoring domain, and
(Continue reading)

Parthasarathy Bhuvaragan | 27 Apr 09:14 2016
Picon

Re: tipcutils 2.1.1 build issue

Hi Guna,

Always send emails to tipc-discussion, instead of addressing individuals.

For your specific problem, set the CFLAGS before issuing configure as:

./bootstrap
CFLAGS=-I/root/rpmbuild/BUILD/linux-4.4/usr/include ./configure 

regards
Partha

On 04/26/2016 06:20 PM, GUNA wrote:
> Hi,
>
> I tried to use the latest tipcutils and compiled against my new kernel tree
> (4.4.0) as following:
>
> git clone git://tipc.git.sourceforge.net/gitroot/tipc/tipcutils
> cd tipc-tipcutils
> ./bootstrap
> ./configure CFLAGS=-I /root/rpmbuild/BUILD/linux-4.4/usr/include
> make   << == error as follows
>
> tipcutils]# make
> make: *** No targets specified and no makefile found.  Stop.
>
> My target is switch where kernel 4.4.0 is running. I am compiling on
> another server where 3.4.2 kernel is running. Why the "make" is failed and
> what target do I need to specify? I don't want to install any on compiler.
(Continue reading)

Richard Alpe | 25 Apr 10:52 2016
Picon

[PATCH iproute2 1/3] tipc: fix UDP bearer synopsis

Local ip is not required to identify a UDP bearer and shouldn't be
passed to bearer disable, set or get. In this patch we remove the
localip entry from the synopsis of these functions.

Signed-off-by: Richard Alpe <richard.alpe <at> ericsson.com>
---
 man/man8/tipc-bearer.8 | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/man/man8/tipc-bearer.8 b/man/man8/tipc-bearer.8
index 50a1ed2..846f1db 100644
--- a/man/man8/tipc-bearer.8
+++ b/man/man8/tipc-bearer.8
 <at>  <at>  -39,14 +39,12  <at>  <at>  tipc-bearer \- show or modify TIPC bearers
 .B tipc bearer disable media
 .br
 .RB "{ { " eth " | " ib " } " device
-.IR DEVICE
+.IR "DEVICE " }
 .RB "|"
 .br
 .RB "{ " udp
 .B name
-.IR NAME
-.B localip
-.IR LOCALIP " } }"
+.IR NAME " }"
 .br

 .ti -8
(Continue reading)


Gmane