Bill Werner | 1 Mar 17:25 2011

Solaris 11Exp -> Solaris 10 9000 MTU = flapping link?

So here's what I have...

I have a Solaris 11 non-global zone running on a Solaris 11 global.  The NGZ is connected to the global zone via
exclusive IP with a vnics and an etherstub.  The global zone is then connected to a switch to another Solaris
10 box.

The vnics between the Solaris 11 NGZ and GZ use the default 9000 MTU and the Solaris 11 GZ and Solaris 10 box
communicate over standard ethernet 1500 MTU.

The Solaris 11 GZ is of course configured as a router.

Small packet traffic such as ping, ssh, telnet, etc between the 11 NGZ and the Solaris 10 box work just fine.

When I try and do large transfers, such as NFSv4 copies, the Solaris 10 box keeps bringing it's physical link
up and down (it's an e1000g card).

If I change the MTU size to 1500 on the vnics, the problem goes away.

So it appears Solaris 10 is choking on the large packets.  Which I would expect some issues, but I wouldn't
expect the physical link to flap.

But the main issue would seem to be, why isn't the Solaris 11 GZ doing path discovery and dropping the MTU size
down to 1500?
--

-- 
This message posted from opensolaris.org
Picon

Re: Solaris 11Exp -> Solaris 10 9000 MTU = flapping link?


Bill Werner wrote:
> When I try and do large transfers, such as NFSv4 copies, the Solaris 10 box keeps bringing it's physical
link up and down (it's an e1000g card).
>   

There have been link flapping issues with e1000g (6633239 springs to 
mind). Do you have the latest e1000g patch installed? By this, I mean, 
do you have the latest patch which delivers e1000g? (It was included in, 
e.g. the 142909-17 kernel patch).

If you are, then you may have a new bug. If patching your S10 system 
doesn't help, then please log a support call to have this diagnosed and 
fixed.

> If I change the MTU size to 1500 on the vnics, the problem goes away.
>
> So it appears Solaris 10 is choking on the large packets.  Which I would expect some issues, but I wouldn't
expect the physical link to flap.
>
> But the main issue would seem to be, why isn't the Solaris 11 GZ doing path discovery and dropping the MTU
size down to 1500?
>   

I'm not 100% clear on PMTU, but I understood it to rely on the 
ICMP_UNREACH_NEEDFRAG message. If the target host is taking the 
interface down rather than sending a reply, I'd imagine the S11 system 
would not know to lower the MTU - it never received a "need to fragment".

I suspect you would still see this behaviour without the link flapping 
(Continue reading)

Nils Goroll | 13 Mar 20:55 2011
Picon

ioctl(sock, FIONBIO, 0) returns ECONNREFUSED after non-blocking connect() or accept()

Hi,

I am having a hard time understanding the issue documented in
http://www.varnish-cache.org/trac/ticket/865.

This happens at least on snv_124 and snv_111b with latest patches. Basically, it
looks like so_error could contain ECONNREFUSED after a non-blocking connect() or
accept().

Any pointers or hints would be appreciated.

Thanks, Nils
Rao Shoaib | 13 Mar 22:01 2011
Picon

Re: ioctl(sock, FIONBIO, 0) returns ECONNREFUSED after non-blocking connect() or accept()

Nils Goroll wrote:
> Hi,
>
> I am having a hard time understanding the issue documented in
> http://www.varnish-cache.org/trac/ticket/865.
>
> This happens at least on snv_124 and snv_111b with latest patches. Basically, it
> looks like so_error could contain ECONNREFUSED after a non-blocking connect() or
> accept().
>
> Any pointers or hints would be appreciated.
>
> Thanks, Nils
> _______________________________________________
> networking-discuss mailing list
> networking-discuss@...
>   
Can you provide a test case with which this problem can be reproduced.
The comments suggest that the connect attempt succeeds where as Solaris 
thinks it did not. Can you provide the netstat output to confirm that.

Thanks,

Rao.

River Tarnell | 17 Mar 22:37 2011
Picon

ip_mdata_to_mhi vs PPPoE


Hi,

I have an ADSL (PPPoE) router running oi_148.  Over the last couple I've 
days I've been trying to track down an odd network problems: sometimes, 
when two packets arrive close together, the second packet would be 
dropped.  The problem would appear for a few minutes, then disappear, 
seemingly randomly.

Dan McDonald was able to trace the drop to ire_recv_forward_v4(); 
specifically, the packet had MAC_ADDRTYPE_MULTICAST set, so it was 
dropped ("l2 multicast not forwarded");

After examining the code, I believe the problem is as follows:

ip_mdata_to_mhi is responsible for receiving the IP packet, and looking 
at the Ethernet frame to determine if it's a broadcast/multicast packet, 
based on the broadcast bit in the destination address.  It assumes the 
data is structured as follows:

<Ethernet header> -> [optional VLAN headers] -> <IP packet>

To find the Ethernet header, it first looks directly before the IP 
packet (line 7836).  If the ethertype field is neither ETHERTYPE_IP nor 
ETHERTYPE_IPV6, it assumes there's a VLAN header, shifts the pointer 4 
bytes backward, and starts again.

However, in the PPPoE case, the Ethernet frame looks like this:

<ethernet header> -> <pppoe header> -> <ppp header> -> <IP packet>
(Continue reading)

River Tarnell | 17 Mar 23:10 2011
Picon

Re: ip_mdata_to_mhi vs PPPoE


River Tarnell:
> This frame format is passed unmodified to ip_mdata_to_mhi (I have 
> confirmed this using DTrace).  It therefore looks at the PPP header for 
> the ethertype, doesn't find one, and starts scanning backwards over junk 
> data (the PPPoE header) until it finds a match -- i.e., two bytes which 
> are either ETHERTYPE_IP or ETHERTYPE_IPV6.  In most cases, it never 
> will, so it returns (line 7837) without marking the packet as multicast.  

> However, sometimes (effectively randomly), it will find what it thinks 
> is an ethernet header, and then try to interpret the garbage data as the 
> ethernet dst address.  If the garbage happens to have the right bit set, 
> it will mark the packet as multicast, and it will be dropped in 
> ire_recv_forward_v4.

To confirm this I used the DTrace script below, which mimics the 
backward search ip_mdata_to_mhi does.  It produced the following output:

  1  -> ip_mdata_to_mhi                       area to search = 46 bytes
-04: ethertype=aefd src = 71:1b:88:64:11:00 dst = ee:fb:00:23:eb:6c
-08: ethertype=8864 src = 00:23:eb:6c:71:1b dst = d4:85:64:c9:ee:fb
-12: ethertype=eb6c src = 64:c9:ee:fb:00:23 dst = d4:85:64:c9:d4:85
-16: ethertype=eefb src = 64:c9:d4:85:64:c9 dst = 08:00:45:00:d4:85
-20: ethertype=d485 src = 45:00:d4:85:64:c9 dst = 81:00:00:04:08:00 Multicast!
-24: ethertype=d485 src = 00:04:08:00:45:00 dst = 64:c9:ee:fb:81:00
-28: ethertype=800 src = ee:fb:81:00:00:04 dst = 2d:1a:d4:85:64:c9 IP!  Multicast!
-32: ethertype=8100 src = d4:85:64:c9:ee:fb dst = 00:08:5d:13:2d:1a
-36: ethertype=64c9 src = 5d:13:2d:1a:d4:85 dst = 65:20:63:61:00:08 Multicast!
-40: ethertype=2d1a src = 63:61:00:08:5d:13 dst = 6e:20:74:68:65:20
-44: ethertype=8 src = 74:68:65:20:63:61 dst = 65:64:20:6f:6e:20 Multicast!
(Continue reading)

Erik Nordmark | 18 Mar 01:05 2011
Picon

Re: ip_mdata_to_mhi vs PPPoE


On 3/17/11 2:37 PM, River Tarnell wrote:
> I believe the fix is for sppp (or sppptun) to mark the packet as IFT_PPP
> rather than IFT_ETHER.  This would cause ip_mdata_to_mhi to ignore it
> entirely, and since at this point the PPPoE frame cannot be multicast or
> broadcast, there would be no loss of functionality.

Yes, a driver which doesn't use Ethernet headers shouldn't claim to be
an Ethernet in the dl_info_ack.

    Erik

Garrett D'Amore | 18 Mar 03:12 2011

Re: ip_mdata_to_mhi vs PPPoE

On Thu, 2011-03-17 at 17:05 -0700, Erik Nordmark wrote:
> On 3/17/11 2:37 PM, River Tarnell wrote:
> > I believe the fix is for sppp (or sppptun) to mark the packet as IFT_PPP
> > rather than IFT_ETHER.  This would cause ip_mdata_to_mhi to ignore it
> > entirely, and since at this point the PPPoE frame cannot be multicast or
> > broadcast, there would be no loss of functionality.
> 
> Yes, a driver which doesn't use Ethernet headers shouldn't claim to be
> an Ethernet in the dl_info_ack.
> 
>     Erik

Agreed 100%.  What we see is this *appears* to be done for the benefit
of snoop, but I'd love to hear from Jim Carlson or someone else familiar
with the code.

My gut instinct here is it would be better to define DL_PPP and have the
PPP use it.  Modifying snoop to support such would take two additional
lines in a table in snoop_ether.c, since we already have support
interpreting PPP headers (for PPPoE) in the code...

What are we missing?

	- Garrett

James Carlson | 18 Mar 12:40 2011

Re: ip_mdata_to_mhi vs PPPoE

Garrett D'Amore wrote:
> On Thu, 2011-03-17 at 17:05 -0700, Erik Nordmark wrote:
>> On 3/17/11 2:37 PM, River Tarnell wrote:
>>> I believe the fix is for sppp (or sppptun) to mark the packet as IFT_PPP
>>> rather than IFT_ETHER.  This would cause ip_mdata_to_mhi to ignore it
>>> entirely, and since at this point the PPPoE frame cannot be multicast or
>>> broadcast, there would be no loss of functionality.
>> Yes, a driver which doesn't use Ethernet headers shouldn't claim to be
>> an Ethernet in the dl_info_ack.
>>
>>     Erik
> 
> Agreed 100%.  What we see is this *appears* to be done for the benefit
> of snoop, but I'd love to hear from Jim Carlson or someone else familiar
> with the code.

Sigh.

Yes, it came that way from the open source, and we discussed changing it
-- at some length.

The reason the open source code was originally written that way was that
the Solaris IP stack had no decent support for point-to-point links.  It
generally assumed everything was either Ethernet or FDDI.  And the
assumptions ran all over the system.  For instance, snoop assumed every
DLPI device was Ethernet-like and would drop core without that header in
place.  And there were several system daemons that had built in
knowledge of DLPI.  Faking it was easier than rewriting the world --
especially when the authors of the PPP code didn't have access to the
Solaris source and couldn't fix its problems.
(Continue reading)

Sebastien Roy | 18 Mar 14:57 2011
Picon

Re: ip_mdata_to_mhi vs PPPoE

On 03/18/11 07:40 AM, James Carlson wrote:
> Garrett D'Amore wrote:
>> On Thu, 2011-03-17 at 17:05 -0700, Erik Nordmark wrote:
>>> On 3/17/11 2:37 PM, River Tarnell wrote:
>>>> I believe the fix is for sppp (or sppptun) to mark the packet as IFT_PPP
>>>> rather than IFT_ETHER.  This would cause ip_mdata_to_mhi to ignore it
>>>> entirely, and since at this point the PPPoE frame cannot be multicast or
>>>> broadcast, there would be no loss of functionality.
>>> Yes, a driver which doesn't use Ethernet headers shouldn't claim to be
>>> an Ethernet in the dl_info_ack.
>>>
>>>      Erik
>>
>> Agreed 100%.  What we see is this *appears* to be done for the benefit
>> of snoop, but I'd love to hear from Jim Carlson or someone else familiar
>> with the code.
>
> Sigh.
>
> Yes, it came that way from the open source, and we discussed changing it
> -- at some length.
>
> The reason the open source code was originally written that way was that
> the Solaris IP stack had no decent support for point-to-point links.  It
> generally assumed everything was either Ethernet or FDDI.  And the
> assumptions ran all over the system.  For instance, snoop assumed every
> DLPI device was Ethernet-like and would drop core without that header in
> place.  And there were several system daemons that had built in
> knowledge of DLPI.  Faking it was easier than rewriting the world --
> especially when the authors of the PPP code didn't have access to the
(Continue reading)


Gmane