list corruption in IPOIB
Hi Shlomo & Or,
We've seen below neigh->list list corruption warning during testing,
From Dongsu's and my opinion, several place also need
netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh->list , I
tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it
improved the situation, there're some other places in ipoib_main.c and
ipoib_mcast.c, but I don't know which lock should be added, if you can
take some time to look into it, that will be great.
May 17 15:17:57 ib2 kernel: [ 274.910792] ib0: failed to send RTU: -22
May 17 15:17:59 ib2 kernel: [ 276.118006] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:01 ib2 kernel: [ 278.557566] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:02 ib2 kernel: [ 279.793565] ib0: failed to send cm req: -22
May 17 15:18:02 ib2 kernel: [ 279.793713] ------------[ cut here
]------------
May 17 15:18:02 ib2 kernel: [ 279.793779] WARNING: at
lib/list_debug.c:49 __list_del_entry+0x63/0xd0()
May 17 15:18:02 ib2 kernel: [ 279.793840] Hardware name: System Product
Name
May 17 15:18:02 ib2 kernel: [ 279.793898] list_del corruption,
ffff8801f9708740->next is LIST_POISON1 (dead000000100100)
May 17 15:18:02 ib2 kernel: [ 279.794013] Modules linked in: rdma_ucm
rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib
ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative
cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm
psmouse powernow_k8 tpm_tis tpm tpm_bios serio_raw edac_core mperf evdev
(Continue reading)