Re: dladm show-ether doesn't show interfaces
On Mon, May 3, 2010 at 10:11 PM, Giovanni Tirloni
<gtirloni <at> sysdroid.com> wrote:
On Sun, May 2, 2010 at 10:56 PM, Giovanni Tirloni
<gtirloni-SSCLyIhHoYRWk0Htik3J/w@public.gmane.org> wrote:
Hello,
I think we've been hit by bug 6908043, dladm show-ether stopped showing any interface at all.
All details added to http://www.sysdroid.com/opensolaris/bugs/6908043.txt
The only recent changes that we had were moving LACP from "short" to "long" on aggr0 (e1000g1+e1000g2) and we started using "dladm show-ether" on Zabbix to monitor the interface status since a few days ago. So I don't know if it was happening before and we never noticed or if heavy use of dladm show-ether is triggering the problem.
Turned out dlmgmtd(1M) was stuck and had to be restarted:
# svcadm restart datalink-management
# dladm show-ether
LINK PTYPE STATE AUTO SPEED-DUPLEX PAUSE
e1000g0 current up yes 1G-f bi
e1000g1 current up yes 1G-f bi
e1000g2 current up yes 1G-f bi
e1000g3 current up yes 1G-f bi
I'm still trying to understand what causes it. Perhaps dladm should have better error reporting in case it doesn't get a satisfactory answer from /dev/dld.
There seems to be a memory link in dlmgmtd since it's using 3.9GB of memory (12% of 32GB).
Should I file a new bug ? If anyone is interested I can send the core dump.
I also updated the file below with a dtrace output of the functions being called when you issue a "dladm show-ether" in another terminal on the same server (it's quite long).
http://www.sysdroid.com/opensolaris/bugs/6908043.txt# ps aux | grep dlmgmt
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
dladm 15 0.0 12.140393764039376 ? S Mar 30 6:47 /sbin/dlmgmtd
# gcore 15
# ls -lh core.15
-rw-r--r-- 1 root root 3.9G 2010-05-03 22:17 core.15
# pstack 15
15: /sbin/dlmgmtd
----------------- lwp# 1 / thread# 1 --------------------
feef0547 pause ()
08053a18 main (1, 8047e50, 8047e58, 8047e0c) + b8
0805326d _start (1, 8047ef0, 0, 8047efe, 8047f0e, 8047f1f) + 7d
----------------- lwp# 2 / thread# 2 --------------------
feef0ea1 door (fec9e980, 410, 0, fec9ee00, f5f00, a)
08054b98 dlmgmt_handler (0, fec9edd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 3 / thread# 3 --------------------
feef0ea1 door (feb9f980, 410, 0, feb9fe00, f5f00, a)
08054b98 dlmgmt_handler (0, feb9fdd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 4 / thread# 4 --------------------
feef0ea1 door (fea7e980, 410, 0, fea7ee00, f5f00, a)
08054b98 dlmgmt_handler (0, fea7edd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 5 / thread# 5 --------------------
feef0ea1 door (fe80ed90, 18, 0, fe80ee00, f5f00, a)
08054b98 dlmgmt_handler (0, fe80ede8, 18, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 6 / thread# 6 --------------------
feef0ea1 door (fe70f980, 410, 0, fe70fe00, f5f00, a)
08054b98 dlmgmt_handler (0, fe70fdd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
--
Giovanni
<div>
<div class="gmail_quote">On Mon, May 3, 2010 at 10:11 PM, Giovanni Tirloni <span dir="ltr"><<a href="mailto:gtirloni@...">gtirloni <at> sysdroid.com</a>></span> wrote:<br><blockquote class="gmail_quote">
<div>
<div></div>
<div class="h5">
<div class="gmail_quote">On Sun, May 2, 2010 at 10:56 PM, Giovanni Tirloni <span dir="ltr"><<a href="mailto:gtirloni@..." target="_blank">gtirloni@...</a>></span> wrote:<br><blockquote class="gmail_quote">
Hello,<br><br> I think we've been hit by bug 6908043, dladm show-ether stopped showing any interface at all.<br><br> All details added to <a href="http://www.sysdroid.com/opensolaris/bugs/6908043.txt" target="_blank">http://www.sysdroid.com/opensolaris/bugs/6908043.txt</a><br><br> The only recent changes that we had were moving LACP from "short" to "long" on aggr0 (e1000g1+e1000g2) and we started using "dladm show-ether" on Zabbix to monitor the interface status since a few days ago. So I don't know if it was happening before and we never noticed or if heavy use of dladm show-ether is triggering the problem.<br>
</blockquote>
</div>
<br>
</div>
</div>Turned out dlmgmtd(1M) was stuck and had to be restarted:<br><br># svcadm restart datalink-management<br><br># dladm show-ether<br>LINK PTYPE STATE AUTO SPEED-DUPLEX PAUSE<br>
e1000g0 current up yes 1G-f bi<br>e1000g1 current up yes 1G-f bi<br>e1000g2 current up yes 1G-f bi<br>
e1000g3 current up yes 1G-f bi<br><br>I'm still trying to understand what causes it. Perhaps dladm should have better error reporting in case it doesn't get a satisfactory answer from /dev/dld.<br clear="all">
</blockquote>
</div>
<br>There seems to be a memory link in dlmgmtd since it's using 3.9GB of memory (12% of 32GB).<br><br>Should I file a new bug ? If anyone is interested I can send the core dump.<br><br>I also updated the file below with a dtrace output of the functions being called when you issue a "dladm show-ether" in another terminal on the same server (it's quite long).<br><br> <a href="http://www.sysdroid.com/opensolaris/bugs/6908043.txt" target="_blank">http://www.sysdroid.com/opensolaris/bugs/6908043.txt</a><br><br># ps aux | grep dlmgmt<br>USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND<br>
dladm 15 0.0 12.140393764039376 ? S Mar 30 6:47 /sbin/dlmgmtd<br><br># gcore 15<br># ls -lh core.15<br>-rw-r--r-- 1 root root 3.9G 2010-05-03 22:17 core.15<br><br># pstack 15<br>15: /sbin/dlmgmtd<br>----------------- lwp# 1 / thread# 1 --------------------<br>
feef0547 pause ()<br> 08053a18 main (1, 8047e50, 8047e58, 8047e0c) + b8<br> 0805326d _start (1, 8047ef0, 0, 8047efe, 8047f0e, 8047f1f) + 7d<br>----------------- lwp# 2 / thread# 2 --------------------<br> feef0ea1 door (fec9e980, 410, 0, fec9ee00, f5f00, a)<br>
08054b98 dlmgmt_handler (0, fec9edd8, 28, 0, 0, 8054a9c) + fc<br> feef0ed2 __door_return () + 52<br>----------------- lwp# 3 / thread# 3 --------------------<br> feef0ea1 door (feb9f980, 410, 0, feb9fe00, f5f00, a)<br>
08054b98 dlmgmt_handler (0, feb9fdd8, 28, 0, 0, 8054a9c) + fc<br> feef0ed2 __door_return () + 52<br>----------------- lwp# 4 / thread# 4 --------------------<br> feef0ea1 door (fea7e980, 410, 0, fea7ee00, f5f00, a)<br>
08054b98 dlmgmt_handler (0, fea7edd8, 28, 0, 0, 8054a9c) + fc<br> feef0ed2 __door_return () + 52<br>----------------- lwp# 5 / thread# 5 --------------------<br> feef0ea1 door (fe80ed90, 18, 0, fe80ee00, f5f00, a)<br>
08054b98 dlmgmt_handler (0, fe80ede8, 18, 0, 0, 8054a9c) + fc<br> feef0ed2 __door_return () + 52<br>----------------- lwp# 6 / thread# 6 --------------------<br> feef0ea1 door (fe70f980, 410, 0, fe70fe00, f5f00, a)<br>
08054b98 dlmgmt_handler (0, fe70fdd8, 28, 0, 0, 8054a9c) + fc<br> feef0ed2 __door_return () + 52<br><br><br>-- <br>Giovanni<br>
</div>