6bone | 4 Jun 2009 07:36
Picon

netbsd5 / wm0: device timeout (txfree 3907 txsfree 0 txnext 3408)

hello,

with enabled tso4 hardware capabilities my Intel nic reports

...
wm0: device timeout (txfree 3790 txsfree 0 txnext 3078)
wm0: device timeout (txfree 3905 txsfree 0 txnext 1370)
wm0: device timeout (txfree 3907 txsfree 0 txnext 3408)
...

The connected switch shows link-down/link-up messages.

With disabled tso capabilities everything works well. Intel NICs with 
other chipsets work well even with enabled tso4.

My hardware:

wm0 at pci9 dev 0 function 0: Intel PRO/1000 PT (82571EB), rev. 6
wm0: interrupting at ioapic0 pin 16
wm0: PCI-Express bus
wm0: 65536 word (16 address bits) SPI EEPROM
wm0: Ethernet address 00:15:17:0e:98:5e
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto

The nic is connected to a cisco c2960G.

Is it a netbsd bug or maybe a hardware problem?

Thank you for your efforts
(Continue reading)

SAITOH Masanobu | 4 Jun 2009 10:53
Picon

Re: netbsd5 / wm0: device timeout (txfree 3907 txsfree 0 txnext 3408)


 Hello.

 > hello,
 > 
 > with enabled tso4 hardware capabilities my Intel nic reports
 > 
 > ...
 > wm0: device timeout (txfree 3790 txsfree 0 txnext 3078)
 > wm0: device timeout (txfree 3905 txsfree 0 txnext 1370)
 > wm0: device timeout (txfree 3907 txsfree 0 txnext 3408)
 > ...
 > 
 > The connected switch shows link-down/link-up messages.
 > 
 > With disabled tso capabilities everything works well. Intel NICs with other chipsets work well even with
enabled tso4.
 > 
 > 
 > My hardware:
 > 
 > wm0 at pci9 dev 0 function 0: Intel PRO/1000 PT (82571EB), rev. 6
 > wm0: interrupting at ioapic0 pin 16
 > wm0: PCI-Express bus
 > wm0: 65536 word (16 address bits) SPI EEPROM
 > wm0: Ethernet address 00:15:17:0e:98:5e
 > igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
 > igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
 > 
 > The nic is connected to a cisco c2960G.
(Continue reading)

Petar Bogdanovic | 7 Jun 2009 11:38

[Fwd: Re: dccifd: restart after signal 6]

Hi,

attached is an excerpt from the [1]DCC mailing list.  One of the dcc
daemons started to die unexpectedly because it used _res in conjunction
with threads which seems to be a weird combination on NetBSD even if you
assume that your resolver is not thread safe and lock him properly:

/usr/src/include/resolv.h:
	/*
	 * Source and Binary compatibility; _res will not work properly
	 * with multi-threaded programs.
	 */
	extern struct __res_state *__res_state(void);
	#define _res (*__res_state())

/usr/src/lib/libpthread/res_state.c:
	/*
	 * This is aliased via a macro to _res; don't allow multi-threaded programs
	 * to use it.
	 */
	res_state
	__res_state(void)
	{
		static const char res[] = "_res is not supported for multi-threaded"
		    " programs.\n";
		(void)write(STDERR_FILENO, res, sizeof(res) - 1);
		abort();
		return NULL;
	}

(Continue reading)

Serban Bogdan | 7 Jun 2009 12:10
Picon
Favicon

diffserv altq.conf file


Hello

I want to know how the rules used for marking a packet are evaluated in the altq.conf file. It is happening
like in pf.conf file where the rules are evaluated one by one until the last(if we are not using the quick
word) and the last matching rule is used? I want to build a configuration file( a big one) for marking the TOS
field from ip header because i want to compare diffserv utilities from bsd kernel with other black box
systems that are doing this task.

I am waiting a response or if i am not clear enough please tell me.

Best regards,

Bogdan

      __________________________________________________________________
Looking for the perfect gift? Give the gift of Flickr! 

http://www.flickr.com/gift/

Christos Zoulas | 8 Jun 2009 02:31

Re: [Fwd: Re: dccifd: restart after signal 6]

In article <20090607093815.GA4811 <at> pintail.smokva.net>,
Petar Bogdanovic  <petar <at> smokva.net> wrote:

How many years will it take people to switch from res_foo to res_nfoo
which is re-entrant? Really, these functions have been there for
more than a decade. There is no excuse for a multi-threaded program
not to use them if they are available and instead use their own
API and do their own locking. What is next? Don't use the _r functions
and do your own locking? Add mutexes to protect a possibly non-thread-safe
malloc? Unless of course I am missing something, and if so I apologize.
99% of the programs that use _res in a multi-threaded environment do
so incorrectly, and it is a good thing to make them use the new API's.
Someone made a mistake exposing _res a *long* time ago. Let's not
perpetuate it by having new code try to use it.

christos

>Hi,
>
>attached is an excerpt from the [1]DCC mailing list.  One of the dcc
>daemons started to die unexpectedly because it used _res in conjunction
>with threads which seems to be a weird combination on NetBSD even if you
>assume that your resolver is not thread safe and lock him properly:
>
>/usr/src/include/resolv.h:
>	/*
>	 * Source and Binary compatibility; _res will not work properly
>	 * with multi-threaded programs.
>	 */
>	extern struct __res_state *__res_state(void);
(Continue reading)

Vernon Schryver | 8 Jun 2009 16:25
Favicon

Re: [Fwd: Re: dccifd: restart after signal 6]

> To: tech-net <at> netbsd.org
> From:  christos <at> astron.com (Christos Zoulas)

> How many years will it take people to switch from res_foo to res_nfoo
> which is re-entrant? Really, these functions have been there for
> more than a decade. There is no excuse for a multi-threaded program
> not to use them if they are available and instead use their own
> API and do their own locking. What is next? Don't use the _r functions
> and do your own locking? Add mutexes to protect a possibly non-thread-safe
> malloc? Unless of course I am missing something, and if so I apologize.
> 99% of the programs that use _res in a multi-threaded environment do
> so incorrectly, and it is a good thing to make them use the new API's.
> Someone made a mistake exposing _res a *long* time ago. Let's not
> perpetuate it by having new code try to use it.

It is crazy to expect people to use undocument facilities.
"res_init", "res_send", and so forth appear in `man 3 resolver` on
"NetBSD 4.0.1 (GENERIC) " but the strings "nres" and "thread" do
not appear even once.

Then there is the issue of how the resolver state structure set by
res_nfoo is supposed to be communicated to the resolver inside
gethostbyname().  Putting values into a per-thread, caller-maintained
structure does nothing unless the library code knows to use the per-thread
structure instead of the common global structure.  Have you bothered
to check that changing values with res_nfoo has any effect?  Maybe set
the retry limits 1 retransmission and 1 second and then seeing how soon
a failure happens when you try to resolve a domain whose authoritative
server does not answer at all?  (I use timeo.rhyolite.com for such tests)

(Continue reading)

Christos Zoulas | 8 Jun 2009 17:00

Re: [Fwd: Re: dccifd: restart after signal 6]

On Jun 8,  2:25pm, vjs <at> calcite.rhyolite.com (Vernon Schryver) wrote:
-- Subject: Re: [Fwd: Re: dccifd: restart after signal 6]

| > To: tech-net <at> netbsd.org
| > From:  christos <at> astron.com (Christos Zoulas)
| 
| > How many years will it take people to switch from res_foo to res_nfoo
| > which is re-entrant? Really, these functions have been there for
| > more than a decade. There is no excuse for a multi-threaded program
| > not to use them if they are available and instead use their own
| > API and do their own locking. What is next? Don't use the _r functions
| > and do your own locking? Add mutexes to protect a possibly non-thread-safe
| > malloc? Unless of course I am missing something, and if so I apologize.
| > 99% of the programs that use _res in a multi-threaded environment do
| > so incorrectly, and it is a good thing to make them use the new API's.
| > Someone made a mistake exposing _res a *long* time ago. Let's not
| > perpetuate it by having new code try to use it.
| 
| It is crazy to expect people to use undocument facilities.
| "res_init", "res_send", and so forth appear in `man 3 resolver` on
| "NetBSD 4.0.1 (GENERIC) " but the strings "nres" and "thread" do
| not appear even once.

Yes, this is NetBSD's fault. The documentation is there, it just has
not been updated. I will fix it right now.

| Then there is the issue of how the resolver state structure set by
| res_nfoo is supposed to be communicated to the resolver inside
| gethostbyname().  Putting values into a per-thread, caller-maintained
| structure does nothing unless the library code knows to use the per-thread
(Continue reading)

Petar Bogdanovic | 8 Jun 2009 17:11

Re: [Fwd: Re: dccifd: restart after signal 6]

On Mon, Jun 08, 2009 at 11:00:28AM -0400, Christos Zoulas wrote:
> 
> | Finally DO NOT WRITE ME about this stuff.  I care only about the
> | code.  I'm not interested in joining a djb style,
> | we-re-so-wonderful-because-we-tell-each-other-so cult.
> | The nature of NetBSD in this decade is clear regardless of this
> | _res silliness.
> 
> I did not write you. I replied to a question on why _res aborts in
> multi-threaded programs in the mailing list.

I apologize, that was me.  I thought you forgot to cc him, and it didn't
come to my mind, that the action could provoke such an outrage.  Sorry!

   Petar Bogdanovic

David Laight | 8 Jun 2009 23:48
Picon

Re: [Fwd: Re: dccifd: restart after signal 6]

On Mon, Jun 08, 2009 at 11:00:28AM -0400, Christos Zoulas wrote:
> 
> | That you call abort() after writing to stderr in is emblematic of
> | the NetBSD problems.  It would be dicey to call syslog(), but simply
> | assuming that stderr has not been long since closed is at best far
> | too naive for anyone allowed to touch a libc source tree.
> 
> Well, I agree with you, but this is not the only function that prints
> errors to stderr in libc; and if stderr is closed, you could easily
> look at the backtrace in the debugger to find out what went wrong.

And they should all be nuked ....

Many years ago we spent ages locating a problem that was actually
a splurious printf() call (it should have been an sprintf() call).
Since the generated data disn't include a '\n', and stdout is line
buffered, nothing untoward happenend until the code had run enough
times to fill the stdio buffer.
Then the data buffer was written to fd 1.
The program had earlier call close(1), and was reusing fd 1 as a
control pipe to another daemon - the text broke the protocol ....

Ok, if the program had done fclose(stderr) instead of close(1)
then it wouldn't have been a problem - but many daemons will
just do close(0); close(1); close(2); and assume that because
they don't access stdin, stdout or stderr that nothing else will.

	David

--

-- 
(Continue reading)

Hauke Fath | 9 Jun 2009 13:43
Picon

How usable is agr(4)?

All,

while upgrading a busy nfs fileserver, I have changed it to aggregate 
two wm(4) GBit interfaces with agr(4); on the other end is a HP 
procurve 2848 switch.

After a few days, a 'netstat -i' gives me

Name  Mtu   Network       Address              Ipkts Ierrs    Opkts Oerrs Colls
wm0   1500  <Link>        00:30:48:d7:0a:78 349626955    23 338123451 
0     0
wm1   1500  <Link>        00:30:48:d7:0a:79     7958     0     7955     0     0
agr0  1500  <Link>        00:30:48:d7:0a:78 349620694    16 338117217 
3516     2

which is not really balanced. Is this what I should expect? Or what 
am I missing?

	hauke

--

-- 
      The ASCII Ribbon Campaign                    Hauke Fath
()     No HTML/RTF in email            Institut für Nachrichtentechnik
/\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281


Gmane