Srivas Chennu | 3 Jul 2006 17:03
Picon
Picon

e1000 driver timeout with 2.6.x

Hello all,

I'm a relatively new click user trying to build and test a link layer
protocol using Click. My test runs used the click kernel module built
from the latest CVS sources. On a patched 2.6.16.13 kernel with an
original Intel PRO/1000 MT dual port GbE NIC for a click configuration
using FromDevice, the driver abruptly times out during Tx and resets
with messages like those below:

e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue             <0>
TDH                  <97>
TDT                  <9a>
next_to_use          <9a>
next_to_clean        <95>
buffer_info[next_to_clean]
time_stamp           <e66ae>
next_to_watch        <97>
jiffies              <e6b71>
next_to_watch.status <0>
....
....
Eventually I see in the log file:

NETDEV WATCHDOG: eth1: transmit timed out
e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex

Interestingly, this timeout-and-reset problem does not occur when
running my click configuration at the userlevel, but reproduces quite
easily with the kernel module, even when the NIC is working at low
(Continue reading)

Beyers Cronje | 3 Jul 2006 17:18
Picon
Gravatar

Re: e1000 driver timeout with 2.6.x

Hi Srivas,

This is a problem myself, Adam and a few others have been struggling with.
Strange FromDevice gives you the TX hang, as on my system it only happens
when using PollDevice in certain configurations. If possible can you post
the Click config you are using to duplicate the hang?

Adam pointed me to the E1000 dev mailing list on SourceForge and the TX Hang
issue seems to pop up on standard linux (non-click) systems as well. One
possible workaround seems to be to disable tcp segmentation offloading
(TSO), you can do this via 'ethtool -K eth0 tso off', but seems to work only
sometimes ...

What e1000 driver version are you using? Since you only using FromDevice
have you tried the latest e1000 driver?

Anyone else also having this problem?

Beyers

On 7/3/06, Srivas Chennu <chennu <at> hhi.fhg.de> wrote:
>
> Hello all,
>
> I'm a relatively new click user trying to build and test a link layer
> protocol using Click. My test runs used the click kernel module built
> from the latest CVS sources. On a patched 2.6.16.13 kernel with an
> original Intel PRO/1000 MT dual port GbE NIC for a click configuration
> using FromDevice, the driver abruptly times out during Tx and resets
> with messages like those below:
(Continue reading)

Srivas Chennu | 3 Jul 2006 19:34
Picon
Picon

Re: e1000 driver timeout with 2.6.x

Hello Beyers,

Thanks a lot for your speedy response. To answer your question regarding
the e1000 driver, I've downloaded and tested my configuration with the
latest stable release (7.1.9) from sourceforge, and the timeout
stubbornly continues to occur with the TSO option disabled.

For your reference, the possibly relevant snippet of my click
configuration is attached. It uses a click element (onuagent) that I've
written to emulate the protocol being tested, which receives and
forwards packets between 3 interfaces via a customized priority
schedulers.

...
FromDevice($rp0, PROMISC true) -> [0]onuagent;
onuagent[0] -> priosched0 -> ToDevice($rp0);
FromDevice($rp1, PROMISC true) -> [1]onuagent;
onuagent[1] -> priosched1 -> ToDevice($rp1);
FromDevice($lp, PROMISC true) -> [2]onuagent;
onuagent[2] -> priosched2 -> ToDevice($lp);
...

I'm currently attempting to find a combination of a kernel (2.4.x or
2.6.x) and a stable e1000 driver version with which I can reliably use
FromDevice/PollDevice. Any details of a setup that has worked for you in
this regard would be helpful.

Thanks in advance,
Srivas.

(Continue reading)

Venky Rama | 4 Jul 2006 03:44
Picon
Favicon

division - kernel novice question...

since division is not allowed inside kernel, how does click implement division operations? 

  i am trying to read the code for RED and it has 
  RED::set_C1_and_C2() where division seems to be done...?

  _C2 = (_max_p * _min_thresh) / (_max_thresh - _min_thresh);

  any guidance will be appreciated. thank you.

  venky

 		
---------------------------------
Want to be your own boss? Learn how on  Yahoo! Small Business. 
Roman Chertov | 4 Jul 2006 05:47
Picon
Favicon

Re: division - kernel novice question...

Division is allowed as long as you don't use floats or 64bit longs.

Venky Rama wrote:
> since division is not allowed inside kernel, how does click implement division operations? 
>    
>   i am trying to read the code for RED and it has 
>   RED::set_C1_and_C2() where division seems to be done...?
>    
>   _C2 = (_max_p * _min_thresh) / (_max_thresh - _min_thresh);
>    
>   any guidance will be appreciated. thank you.
>    
>   venky
> 
>  		
> ---------------------------------
> Want to be your own boss? Learn how on  Yahoo! Small Business. 
> _______________________________________________
> click mailing list
> click <at> amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
Bart Braem | 5 Jul 2006 16:54
Picon

nsclick: clickclassifier new[] delete[] mismatch

Hello,

I noticed a large error in click-classifier.cc, both for ns-2.26 and ns-2.29. 
In the method void ClickClassifier::LinkLayerFailed(Packet* p) there is a new 
of the data array with new ... len[len] and it is deleted in the same method 
with a plain delete. This gave me a floating point error and of course it can 
be corrected with the patch below.

--- ns-2.29-patch.orig  2006-07-05 16:47:47.000000000 +0200
+++ ns-2.29-patch       2006-07-05 16:48:04.000000000 +0200
 <at>  <at>  -636,7 +636,7  <at>  <at> 
 +    simstate.curtime = GetSimTime();
 +    //fprintf(stderr,"Sending packet up to click...\n");
 +    
simclick_click_send(clickinst_,&simstate,ifid,clicktype,data,len,&simpinfo);
-+    delete data;
++    delete[] data;
 +    data = 0;
 +  }
 +  else {

Regards,
Bart
--

-- 
Bart Braem
PATS research group
Dept. of Mathematics and Computer Sciences
University of Antwerp
G2.36, Building G
Middelheimlaan 1
(Continue reading)

Robert Owen | 7 Jul 2006 00:22
Picon

RSSI values from click

Hi there,

I hope it's not off the topic of this list.
I did some RSSI measurements using click, and found that the RSSI values
almost do not change if the transmission power exceeds a certain threshold.
This seems to be contrary to what one would expect.

Here are the experiment setup and results:
I have two nodes, about 6-7 feet apart, one broadcasting and the other
receiving.
The sender broadcasts 1000 packets with 1 second interval.

tx power (mW)          1         5         10        20         30
40        50        60
mean RSSI       23.314  23.331  25.686  32.149  35.778  35.758  35.733
35.693

Initially, there's only slight RSSI increase with txpower from 1mW to 5mW.
Then it seems to be normal with txpower from 5mW to 30mW.
But from 30mW on, RSSI almost does not change.

I wonder if somebody out there did similar RSSI measurements using click.
Is it possible that the increase from 30mW on does not take effect at all?

Thank you!
Robert Owen | 7 Jul 2006 00:33
Picon

Fwd: RSSI values from click

I might have sent it to a wrong address.
My apologies if I'm flooding the list with another copy.

---------- Forwarded message ----------
From: Robert Owen <robyowen <at> gmail.com>
Date: Jul 6, 2006 5:22 PM
Subject: RSSI values from click
To: click <at> pdos.csail.mit.edu

Hi there,

I hope it's not off the topic of this list.
I did some RSSI measurements using click, and found that the RSSI values
almost do not change if the transmission power exceeds a certain threshold.
This seems to be contrary to what one would expect.

Here are the experiment setup and results:
I have two nodes, about 6-7 feet apart, one broadcasting and the other
receiving.
The sender broadcasts 1000 packets with 1 second interval.

tx power (mW)          1         5         10        20         30
40        50        60
mean RSSI       23.314  23.331  25.686  32.149  35.778  35.758  35.733
35.693

Initially, there's only slight RSSI increase with txpower from 1mW to 5mW.
Then it seems to be normal with txpower from 5mW to 30mW.
But from 30mW on, RSSI almost does not change.

(Continue reading)

Massimiliano Poletto | 6 Jul 2006 04:06

Re: e1000 driver timeout with 2.6.x

Hi Srivas and Beyers, I've spent some time looking at drivers again recently.

What works best for me at present is a patched version of the 6.1.16.2
Intel driver (not the 6.3.9-k4 driver that comes with linux
2.6.16.13).  I attach the driver sources and patch to this email.  I'm
using a 2.6.16.22 kernel, but I don't see why .13 should work any less
well with the driver.

Performance is good, and it is stable across hundreds of
installs/uninstalls and many hours of testing at full line rate
offered load.  I sometimes see messages similar to yours (below is an
example), but they only seem to happen during stress tests when click
is repeatedly installed/uninstalled at very short intervals:
e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
  TDH                  <b0>
  TDT                  <b0>
  next_to_use          <b0>
  next_to_clean        <9e>
buffer_info[next_to_clean]
  dma                  <2c66c040>
  time_stamp           <0>
  next_to_watch        <0>
  jiffies              <82ddae8>
  next_to_watch.status <0>

I'm trying to get a sourceforge 7.x driver to work, but for now this
seems at least workable.

Please let me know if you have problems with this driver, or if you
make other progress yourselves.
(Continue reading)

Beyers Cronje | 11 Jul 2006 17:00
Picon
Gravatar

Re: e1000 driver timeout with 2.6.x

Hi Max!

Thank you very much, I should have time later this week to test and will let
you know how it runs.

Cheers

Beyers Cronje

On 7/6/06, Massimiliano Poletto <maxp <at> mazunetworks.com> wrote:
>
> Hi Srivas and Beyers, I've spent some time looking at drivers again
> recently.
>
> What works best for me at present is a patched version of the 6.1.16.2
> Intel driver (not the 6.3.9-k4 driver that comes with linux
> 2.6.16.13).  I attach the driver sources and patch to this email.  I'm
> using a 2.6.16.22 kernel, but I don't see why .13 should work any less
> well with the driver.
>
> Performance is good, and it is stable across hundreds of
> installs/uninstalls and many hours of testing at full line rate
> offered load.  I sometimes see messages similar to yours (below is an
> example), but they only seem to happen during stress tests when click
> is repeatedly installed/uninstalled at very short intervals:
> e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
>   TDH                  <b0>
>   TDT                  <b0>
>   next_to_use          <b0>
>   next_to_clean        <9e>
(Continue reading)


Gmane