Adrian Chadd | 29 May 20:14 2016
Picon
Gravatar

MCI howto?

hiya!

I'm trying to bring up MCI bluetooth coexistence on the QCA9565 on
FreeBSD. I'm having no end of trouble - it seems that no matter what I
do, the bluetooth always wins and the wifi side disconnects when the
bt does an active scan (HCI "inquiry".)

Does anyone remember the deep and amusing details about MCI support
and how to debug this? I'm sure it's something stupid, but my time at
QCA involved 2-wire and 3-wire coex (which you can debug with a
scope), not MCI :(

Thanks!

-adrian
Robert Smith | 24 May 15:39 2016

Adaptivity issues

Hi,

 

We’ve recently tried to compliance test our Atheros AR9344 based product against EN300 328 & EN301 893.

We are running OpenWRT Chaos Calmer stable version which uses ath9k included within compat-wireless-2016-01-10, we are only using the on SoC radio.

 

It fails both EN300 328 and EN301 893 adaptivity testing. It doesn’t appear to back off transmissions when other interferers are introduced.

We’ve also tried running a version of Barrier Breaker and have seen similar results.

 

Does anyone have any guidance or suggestions on how to pass this compliance testing using ath9k ?

 

Thanks in advance

Robert Smith

Eseye Limited - 8 Frederick Sanger Road - Surrey Research Park - Guildford - Surrey - GU2 7YD Call for direct access to our sales and support teams: • +44 1483 802501 (International and UK sales) • +44 1483 802503 (International and UK Technical Support) • +33 9 87 67 53 37 (France) • +1 484-935-3130 (US) • +61 8 9551 5200 (Australia) ISO27001:2013 Certified
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Alexander Couzens | 20 May 02:31 2016
Picon
Gravatar

[PATCH] ath9k: Update AR9340 initvals for txgain table mode 4

Changing the table for 5Ghz HT20/HT40 to fix a 10dbm signal difference
on the client side when running LEDE on the AP (TpLink CPE510) instead
of the vendor driver. These values are taken from a TpLink CPE510 running
the vendor firmware using a small script that retrieved the same registers
as already defined in the tx gain table.

Signed-off-by: Alexander Couzens <lynxis <at> fe80.eu>
---

Because I don't have any datasheets about this wireless chip I can not
really say, what I'm changing.

 drivers/net/wireless/ath/ath9k/ar9340_initvals.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ar9340_initvals.h b/drivers/net/wireless/ath/ath9k/ar9340_initvals.h
index 2eb163f..ac5767e 100644
--- a/drivers/net/wireless/ath/ath9k/ar9340_initvals.h
+++ b/drivers/net/wireless/ath/ath9k/ar9340_initvals.h
 <at>  <at>  -827,7 +827,7  <at>  <at>  static const u32 ar9340Modes_mixed_ob_db_tx_gain_table_1p0[][5] = {
 	{0x0000a2e0, 0x0000f800, 0x0000f800, 0x03ccc584, 0x03ccc584},
 	{0x0000a2e4, 0x03ff0000, 0x03ff0000, 0x03f0f800, 0x03f0f800},
 	{0x0000a2e8, 0x00000000, 0x00000000, 0x03ff0000, 0x03ff0000},
-	{0x0000a410, 0x000050d9, 0x000050d9, 0x000050d9, 0x000050d9},
+	{0x0000a410, 0x000050da, 0x000050da, 0x000050d9, 0x000050d9},
 	{0x0000a500, 0x00000000, 0x00000000, 0x00000000, 0x00000000},
 	{0x0000a504, 0x06000003, 0x06000003, 0x04000002, 0x04000002},
 	{0x0000a508, 0x0a000020, 0x0a000020, 0x08000004, 0x08000004},
 <at>  <at>  -912,12 +912,12  <at>  <at>  static const u32 ar9340Modes_mixed_ob_db_tx_gain_table_1p0[][5] = {
 	{0x0000b2e0, 0x0000f800, 0x0000f800, 0x03ccc584, 0x03ccc584},
 	{0x0000b2e4, 0x03ff0000, 0x03ff0000, 0x03f0f800, 0x03f0f800},
 	{0x0000b2e8, 0x00000000, 0x00000000, 0x03ff0000, 0x03ff0000},
-	{0x00016044, 0x056db2db, 0x056db2db, 0x03b6d2e4, 0x03b6d2e4},
-	{0x00016048, 0x24925666, 0x24925666, 0x8e481266, 0x8e481266},
+	{0x00016044, 0x056db2e4, 0x056db2e4, 0x03b6d2e4, 0x03b6d2e4},
+	{0x00016048, 0x64925666, 0x64925666, 0x8e481266, 0x8e481266},
 	{0x00016280, 0x01000015, 0x01000015, 0x01001015, 0x01001015},
 	{0x00016288, 0x30318000, 0x30318000, 0x00318000, 0x00318000},
-	{0x00016444, 0x056db2db, 0x056db2db, 0x03b6d2e4, 0x03b6d2e4},
-	{0x00016448, 0x24925666, 0x24925666, 0x8e481266, 0x8e481266},
+	{0x00016444, 0x056db2e4, 0x056db2e4, 0x03b6d2e4, 0x03b6d2e4},
+	{0x00016448, 0x64925666, 0x64925666, 0x8e481266, 0x8e481266},
 };

 static const u32 ar9340Modes_low_ob_db_and_spur_tx_gain_table_1p0[][5] = {
--

-- 
2.8.2
Mathieu Slabbinck | 17 May 15:52 2016
Picon

tx99 memory allocate failure

Hi,

I'm having some issues running the tx99 features of the ath9k driver.
Seems like every time I try to run it, I get this:
sh: write error: cannot allocate memory.

After some poking, I found it's returning -ENOMEN in tx99.c.
Which is due to !sc-tx99_skb
Which on it's turn is due to an empty value for sc->tx99_vif.

So essentially, it's going wrong in ath9k_build_tx99_skb and returns NULL when checking:
if (!sc->tx99_vif) {
    return NULL;
}

Anybody some ideas on what the actual root cause can be of this and/or how to solve it?

Kernel version: 4.1.0

Thanks & Kr

Mathieu
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Dave Taht | 13 May 22:21 2016
Picon
Gravatar

Re: [Make-wifi-fast] Diagram of the ath9k TX path

On Fri, May 13, 2016 at 12:20 PM, Aaron Wood <woody77 <at> gmail.com> wrote:
> On Fri, May 13, 2016 at 11:57 AM, Dave Taht <dave.taht <at> gmail.com> wrote:
>>
>> On Fri, May 13, 2016 at 11:05 AM, Bob McMahon <bob.mcmahon <at> broadcom.com>
>> wrote:
>>>
>>>  don't have the data available for multiple flows at the moment.
>>
>>
>> The world is full of folk trying to make single tcp flows go at maximum
>> speed, with multiple alternatives to cubic.
>
>
> And most web traffic is multiple-flow, even with HTTP/2 and SPDY, due to
> domain/host sharding.  Many bursts of multi-flow traffic.  About the only
> thing that's single-flow is streaming-video (which isn't latency sensitive).

And usually, rate limited. It would be nice if that streaming video actually
fit into a single txop in many cases.

> The only local services that I know of that could use maximal-rate wifi are
> NAS systems using SMB, AFP, etc.

And many of these, until recently, were actually bound by the speed of
their hard disks and by inefficiencies in the protocol.

>
> -Aaron

A useful flent test for seeing the impact of good congestion control,
are tcp_2up_square and tcp_2up_dleay.

There are also other related tests like "reno_cubic_westwood_cdg"
which try one form of tcp against another. I really should sit down
and write a piece about these, to try to show that one flow grabbing
all the link hurts all successor flows.

Both could be better. I like what the teacup people are doing here,
using 3 staggered flows to show their results.

...

and I misspoke a bit earlier, meant to say txop where instead I'd said
ampdu. Multiple ampdus can fit into a txop, and so far as I know, be
block acked differently.

https://books.google.com/books?id=XsF5CgAAQBAJ&pg=PA32&lpg=PA32&dq=multiple+ampdus+in+a+txop&source=bl&ots=dRCYcD9rBc&sig=tVocMORuEXBOsfUlcmuSLTdM0Lw&hl=en&sa=X&ved=0ahUKEwiLxurP69fMAhVU5WMKHVejAlUQ6AEIHzAA#v=onepage&q=multiple%20ampdus%20in%20a%20txop&f=false

One thing I don't have a grip on is the airtime cost on packing
multiple ampdus into a txop, in terms of the block ack, also in
relation to using amdsus as per the 2015 paper referenced off the
"thoughts about airtime fairness thread" that the ath9k list was not
cc'd on.

https://lists.bufferbloat.net/pipermail/make-wifi-fast/2016-May/000661.html

 I note that some of my comments on that thread was due to the overly
EE and math oriented analysis of the "perfect" solution, but I'm over
that now. :) It was otherwise one of the best recent papers on wifi
I've read, and more should read:

 http://www.hindawi.com/journals/misy/2015/548109/

(and all the other cites in that thread were good, too. MIT had the
basics right back in 2003!)

One of my longer term dreams for better congestion control in wifi is
to pack one aggregate in a txop with stuff you care about deeply, and
a second, with stuff you don't (or vice versa).

As also per here, filling in my personal memory gap from 2004 or so
(when I thought block acks would only be used on critical traffic) and
where I started going back and reviewing the standard.

http://blog.cerowrt.org/post/selective_unprotect/

--

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Dave Taht | 13 May 20:57 2016
Picon
Gravatar

Re: [Make-wifi-fast] Diagram of the ath9k TX path



On Fri, May 13, 2016 at 11:05 AM, Bob McMahon <bob.mcmahon <at> broadcom.com> wrote:
The graphs are histograms of mpdu/ampdu, from 1 to 64.   The blue spikes show that the vast majority of traffic is filling an ampdu with 64 mpdus.  The fill stop reason is ampdu full.  The purple fill stop reasons are that the sw fifo (above the driver) went empty indicating a too small CWND for maximum aggregation. 

Can I get you to drive this plot, wifi rate limited to 6,20,and 300mbits and "native", with flent's tcp_upload and rtt_fair_up tests?

My goal is far different than getting a single tcp flow to max speed, it is to get it to close to full throughput with multiple flows while not accumulating 2 sec of buffering...


Or even 100ms of it:



Early experiments with getting a good rate estimate "to fill the queue" from rate control info was basically successful, but lacking rate control, using dql only, is currently taking much longer at higher rates, but works well at lower ones.

 
A driver wants to aggregate to the fullest extent possible.  

 While still retaining tcp congestion control. There are other nuances, like being nice
about total airtime to others sharing the media, minimizing retries due to an overlarge ampdu for the current BER, etc.

I don't remember what section of the 802.11-2012 standard this is from, but:

```
Another unresolved issue is how large a concatenation threshold the devices should set. Ideally, the maximum value is preferable but in a noisy environment, short frame lengths are preferred because of potential retransmissions. The A-MPDU concatenation scheme operates only over the packets that are already buffered in the transmission queue, and thus, if the CPR data rate is low, then efficiency also will be small. There are many ongoing studies on alternative queuing mechanisms different from the standard FIFO. *A combination of frame aggregation and an enhanced queuing algorithm could increase channel efficiency further*.
```


  A work around is to set initcwnd in the router table.

Ugh. Um... no... initcwnd 10 is already too large for many networks. If you set your wifi initcwnd to something like 64, what happens to the 5mbit cable uplink just upstream from that?

There are a couple other parameters that might be of use - tcp_send_lowat and tcp_limit_output_bytes. These were set off, and originally too low for wifi. A good setting for the latter, for ethernet, was about 4096. Then the wifi folk complained, and it got bumped to 64k, and I think now, it's at 256k to make the xen folk happier.

These are all work arounds against the real problem which was not tuning driver queueing to the actual achievable ampdu, and doing fq+aqm to spread the load (essentially "pace" bursts) which is what is happening in michal's patches.
 
I don't have the data available for multiple flows at the moment. 

The world is full of folk trying to make single tcp flows go at maximum speed, with multiple alternatives to cubic. This quest has resulted in the near elimination of the sawtooth along the edge and horrific overbuffering, to a net loss in speed, and a huge perception of "slowness".

Note: I have long figured that a different tcp should be used on wifi uplinks, after we fixed a ton of basic mis-assumptions. As well as tcp's should become more wifi/wireless aware, but tweaking initcwnd, tcp_limit_output_bytes, etc, is not the right thing.

There has been some good tcp research published of late, look into "BBR", and "CDG".
 
Note: That will depend on what exactly defines a flow.

Bob 

On Fri, May 13, 2016 at 10:49 AM, Dave Taht <dave.taht <at> gmail.com> wrote:
I try to stress that single tcp flows should never use all the bandwidth for the sawtooth to function properly.

What happens when you hit it with 4 flows? or 12?

nice graph, but I don't understand the single blue spikes?

On Fri, May 13, 2016 at 10:46 AM, Bob McMahon <bob.mcmahon <at> broadcom.com> wrote:
On driver delays, from a driver development perspective the problem isn't to add delay or not (it shouldn't) it's that the TCP stack isn't presenting sufficient data to fully utilize aggregation.  Below is a histogram comparing aggregations of 3 systems (units are mpdu per ampdu.)  The lowest latency stack is in purple and it's also the worst performance with respect to average throughput.   From a driver perspective, one would like TCP to present sufficient bytes into the pipe that the histogram leans toward the blue.    


I'm not an expert on TCP near congestion avoidance but maybe the algorithm could benefit from RTT as weighted by CWND (or bytes in flight) and hunt that maximum?  

Bob

On Mon, May 9, 2016 at 8:41 PM, David Lang <david <at> lang.hm> wrote:
On Mon, 9 May 2016, Dave Taht wrote:

On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99 <at> gmail.com> wrote:

On 9 May, 2016, at 18:35, Dave Taht <dave.taht <at> gmail.com> wrote:

should we always wait a little bit to see if we can form an aggregate?

I thought the consensus on this front was “no”, as long as we’re making the decision when we have an immediate transmit opportunity.

I think it is more nuanced than how david lang has presented it.

I have four reasons for arguing for no speculative delays.

1. airtime that isn't used can't be saved.

2. lower best-case latency

3. simpler code

4. clean, and gradual service degredation under load.

the arguments against are:

5. throughput per ms of transmit time is better if aggregation happens than if it doesn't.

6. if you don't transmit, some other station may choose to before you would have finished.

#2 is obvious, but with the caviot that anytime you transmit you may be delaying someone else.

#1 and #6 are flip sides of each other. we want _someone_ to use the airtime, the question is who.

#3 and #4 are closely related.

If you follow my approach (transmit immediately if you can, aggregate when you have a queue), the code really has one mode (plus queuing). "If you have a Transmit Oppertunity, transmit up to X packets from the queue", and it doesn't matter if it's only one packet.

If you delay the first packet to give you a chance to aggregate it with others, you add in the complexity and overhead of timers (including cancelling timers, slippage in timers, etc) and you add "first packet, start timers" mode to deal with.

I grant you that the first approach will "saturate" the airtime at lower traffic levels, but at that point all the stations will start aggregating the minimum amount needed to keep the air saturated, while still minimizing latency.

I then expect that application related optimizations would then further complicate the second approach. there are just too many cases where small amounts of data have to be sent and other things serialize behind them.

DNS lookup to find a domain to then to a 3-way handshake to then do a request to see if the <web something> library has been updated since last cached (repeat for several libraries) to then fetch the actual page content. All of these thing up to the actual page content could be single packets that have to be sent (and responded to with a single packet), waiting for the prior one to complete. If you add a few ms to each of these, you can easily hit 100ms in added latency. Once you start to try and special cases these sorts of things, the code complexity multiplies.

So I believe that the KISS approach ends up with a 'worse is better' situation.

David Lang

_______________________________________________
Make-wifi-fast mailing list
Make-wifi-fast <at> lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/make-wifi-fast





--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org




--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Dave Taht | 13 May 19:49 2016
Picon
Gravatar

Re: [Make-wifi-fast] Diagram of the ath9k TX path

I try to stress that single tcp flows should never use all the bandwidth for the sawtooth to function properly.

What happens when you hit it with 4 flows? or 12?

nice graph, but I don't understand the single blue spikes?

On Fri, May 13, 2016 at 10:46 AM, Bob McMahon <bob.mcmahon <at> broadcom.com> wrote:
On driver delays, from a driver development perspective the problem isn't to add delay or not (it shouldn't) it's that the TCP stack isn't presenting sufficient data to fully utilize aggregation.  Below is a histogram comparing aggregations of 3 systems (units are mpdu per ampdu.)  The lowest latency stack is in purple and it's also the worst performance with respect to average throughput.   From a driver perspective, one would like TCP to present sufficient bytes into the pipe that the histogram leans toward the blue.    


I'm not an expert on TCP near congestion avoidance but maybe the algorithm could benefit from RTT as weighted by CWND (or bytes in flight) and hunt that maximum?  

Bob

On Mon, May 9, 2016 at 8:41 PM, David Lang <david <at> lang.hm> wrote:
On Mon, 9 May 2016, Dave Taht wrote:

On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99 <at> gmail.com> wrote:

On 9 May, 2016, at 18:35, Dave Taht <dave.taht <at> gmail.com> wrote:

should we always wait a little bit to see if we can form an aggregate?

I thought the consensus on this front was “no”, as long as we’re making the decision when we have an immediate transmit opportunity.

I think it is more nuanced than how david lang has presented it.

I have four reasons for arguing for no speculative delays.

1. airtime that isn't used can't be saved.

2. lower best-case latency

3. simpler code

4. clean, and gradual service degredation under load.

the arguments against are:

5. throughput per ms of transmit time is better if aggregation happens than if it doesn't.

6. if you don't transmit, some other station may choose to before you would have finished.

#2 is obvious, but with the caviot that anytime you transmit you may be delaying someone else.

#1 and #6 are flip sides of each other. we want _someone_ to use the airtime, the question is who.

#3 and #4 are closely related.

If you follow my approach (transmit immediately if you can, aggregate when you have a queue), the code really has one mode (plus queuing). "If you have a Transmit Oppertunity, transmit up to X packets from the queue", and it doesn't matter if it's only one packet.

If you delay the first packet to give you a chance to aggregate it with others, you add in the complexity and overhead of timers (including cancelling timers, slippage in timers, etc) and you add "first packet, start timers" mode to deal with.

I grant you that the first approach will "saturate" the airtime at lower traffic levels, but at that point all the stations will start aggregating the minimum amount needed to keep the air saturated, while still minimizing latency.

I then expect that application related optimizations would then further complicate the second approach. there are just too many cases where small amounts of data have to be sent and other things serialize behind them.

DNS lookup to find a domain to then to a 3-way handshake to then do a request to see if the <web something> library has been updated since last cached (repeat for several libraries) to then fetch the actual page content. All of these thing up to the actual page content could be single packets that have to be sent (and responded to with a single packet), waiting for the prior one to complete. If you add a few ms to each of these, you can easily hit 100ms in added latency. Once you start to try and special cases these sorts of things, the code complexity multiplies.

So I believe that the KISS approach ends up with a 'worse is better' situation.

David Lang

_______________________________________________
Make-wifi-fast mailing list
Make-wifi-fast <at> lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/make-wifi-fast





--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Ben Greear | 11 May 23:55 2016

802.11j (4.9Ghz support) for ath9k?

I found some email from 2011 about enabling 4.9Ghz channels in
ath9k, but I cannot find any patches, and it does not seem they
were ever applied upstream.

Anyone have any patches they would like to share for this?

Thanks,
Ben

--

-- 
Ben Greear <greearb <at> candelatech.com>
Candela Technologies Inc  http://www.candelatech.com
Dave Taht | 10 May 06:59 2016
Picon
Gravatar

Re: [Make-wifi-fast] Diagram of the ath9k TX path

This is a very good overview, thank you. I'd like to take apart
station behavior on wifi with a web application... as a straw man.

On Mon, May 9, 2016 at 8:41 PM, David Lang <david <at> lang.hm> wrote:
> On Mon, 9 May 2016, Dave Taht wrote:
>
>> On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99 <at> gmail.com>
>> wrote:
>>>
>>>
>>>> On 9 May, 2016, at 18:35, Dave Taht <dave.taht <at> gmail.com> wrote:
>>>>
>>>> should we always wait a little bit to see if we can form an aggregate?
>>>
>>>
>>> I thought the consensus on this front was “no”, as long as we’re making
>>> the decision when we have an immediate transmit opportunity.
>>
>>
>> I think it is more nuanced than how david lang has presented it.
>
>
> I have four reasons for arguing for no speculative delays.
>
> 1. airtime that isn't used can't be saved.
>
> 2. lower best-case latency
>
> 3. simpler code
>
> 4. clean, and gradual service degredation under load.
>
> the arguments against are:
>
> 5. throughput per ms of transmit time is better if aggregation happens than
> if it doesn't.
>
> 6. if you don't transmit, some other station may choose to before you would
> have finished.
>
> #2 is obvious, but with the caviot that anytime you transmit you may be
> delaying someone else.
>
> #1 and #6 are flip sides of each other. we want _someone_ to use the
> airtime, the question is who.
>
> #3 and #4 are closely related.
>
> If you follow my approach (transmit immediately if you can, aggregate when
> you have a queue), the code really has one mode (plus queuing). "If you have
> a Transmit Oppertunity, transmit up to X packets from the queue", and it
> doesn't matter if it's only one packet.
>
> If you delay the first packet to give you a chance to aggregate it with
> others, you add in the complexity and overhead of timers (including
> cancelling timers, slippage in timers, etc) and you add "first packet, start
> timers" mode to deal with.
>
> I grant you that the first approach will "saturate" the airtime at lower
> traffic levels, but at that point all the stations will start aggregating
> the minimum amount needed to keep the air saturated, while still minimizing
> latency.
>
> I then expect that application related optimizations would then further
> complicate the second approach. there are just too many cases where small
> amounts of data have to be sent and other things serialize behind them.
>
> DNS lookup to find a domain to then to a 3-way handshake to then do a
> request to see if the <web something> library has been updated since last
> cached (repeat for several libraries) to then fetch the actual page content.
> All of these thing up to the actual page content could be single packets
> that have to be sent (and responded to with a single packet), waiting for
> the prior one to complete. If you add a few ms to each of these, you can
> easily hit 100ms in added latency. Once you start to try and special cases
> these sorts of things, the code complexity multiplies.

Take web page parsing as an example. The first request is a dns
lookup. The second request is a http get (which can include a few more
round trips for
negotiating SSL), the next is a flurry of page parsing that results in
the internal web browser attempting to schedule it's requests best and
then sending out the relevant dns and tcp flows as best it can figure
out, and then, typically several seconds of data transfer across each
set of flows.

Page paint is bound by getting the critical portions of the resulting
data parsed and laid out properly.

Now, I'd really like that early phase to be optimized by APs by
something more like SQF, where when a station appears and does a few
packet exchanges that it gets priority over stations taking big flows
on a more regular basis, so it more rapidly gets into flow balance
with the other stations.

(and then, for most use cases, like web, exits)

the second phase, of actual transfer, is also bound by RTT. I have no
idea to what extent wifi folk actually put into typical web transfer
delays (20-80ms),
but they are there...

...

The idea of the wifi driver waiting a bit to form a better aggregate
to fit into a txop ties into two slightly different timings and flow
behaviors.

If it is taking 10ms to get a txop in the first place, taking more
time to assemble a good batch of packets to fit into "your" txop would
be good.

If it is taking 4ms to transfer your last txop, well, more packets may
arrive for you in that interval, and feed into your existing flows to
keep them going,
if you defer feeding the hardware with them.

Also, classic tcp acking goes out the window with competing acks at layer 2.

I don't know if quic can do the equivalent of stretch acks...

but one layer 3 ack, block acked by layer 2 in wifi, suffices... if
you have a ton of tcp acks outstanding, block acking them all is
expensive...

> So I believe that the KISS approach ends up with a 'worse is better'
> situation.

Code is going to get more complex anyway, and there are other
optimizations that could be made.

One item I realized recently is that part of codel need not run on
every packet in every flow for stuff destined to fit into a single
txop. It is sufficient to see if it declared a drop on the first
packet in a flow destined for a given txop.

You can then mark that entire flow (in a txop) as droppable (QoSNoAck)
within that txop (as it is within an RTT, and even losing all the
packets there will only cause the rate to halve).

>
> David Lang

--

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Dave Taht | 10 May 04:59 2016
Picon
Gravatar

Re: [Make-wifi-fast] Diagram of the ath9k TX path

On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99 <at> gmail.com> wrote:
>
>> On 9 May, 2016, at 18:35, Dave Taht <dave.taht <at> gmail.com> wrote:
>>
>> should we always wait a little bit to see if we can form an aggregate?
>
> I thought the consensus on this front was “no”, as long as we’re making the decision when we have an
immediate transmit opportunity.

I think it is more nuanced than how david lang has presented it. We
haven't argued the finer points just yet -
merely seeing 12-20ms latency across the entire 6-300mbit range I've
tested thus has been a joy,
and I'd like to at least think about ways to cut another order of
magnitude off of that while making better use of packing the medium.

http://blog.cerowrt.org/post/anomolies_thus_far/

So... I don't think we "achieved consensus", I just faded... I thought
at the time that merely getting down from 2+seconds to 20ms induced
latency was vastly more important :), and I didn't want to belabor the
point until we had some solid results. I'll still settle for "1 agg in
the hardware, 1 agg in the driver"... but smaller, and better formed,
aggs under contention - which might sometimes involve a pause for a
hundred usec to gather up more, when empty, or more, when the driver
is known to be busy.

...

Over the weekend I did some experiments setting the beacon advertised
txop size for best effort traffic to 94 (same size as the vi queue
that was so busted in earlier tests (
http://blog.cerowrt.org/post/cs5_lockout/ ) ) to try to see if the
station (or AP) paid attention to it... it was remarkable the
bandwidth symmetry I got compared to the defaults. This chart also
shows the size of the win against the stock ath10k firmware and driver
in terms of latency, and not having flows collapse...

http://blog.cerowrt.org/flent/txop_94/rtt_fairbe_compared.svg

Now, given then most people use wifi asymmetrically, perhaps there are
fewer use cases for when the AP and station work more symmetrically,
but this was a pretty result.

http://blog.cerowrt.org/flent/dual-txop-94/up_down_vastly_better.svg

Haven't finished writing up the result, aside from tweaking this
parameter had no seeming affect on the baseline 10-15ms driver latency
left in it, under load.

>
> If we *don’t* have an immediate transmit opportunity, then we *must* wait regardless, and maybe some
other packets will arrive which can then be aggregated.
>
>  - Jonathan Morton
>

--

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Dave Taht | 9 May 17:35 2016
Picon
Gravatar

Re: [Make-wifi-fast] Diagram of the ath9k TX path

On Mon, May 9, 2016 at 4:00 AM, Toke Høiland-Jørgensen <toke <at> toke.dk> wrote:
> I finally finished my flow diagram of the ath9k TX path (corresponding
> to the previous one I did for the mac80211 stack). In case anyone else
> is interested, it's available here:
>
> https://blog.tohojo.dk/2016/05/the-ath9k-tx-path.html

Looks quite helpful. I do not understand why there is a "fast path" at
all in this driver, should we always wait a little bit to see if we
can
form an aggregate?

It would be awesome to be able to adapt michal's work on fq_codeling
things as he did here (leveraging rate control information)

http://blog.cerowrt.org/post/fq_codel_on_ath10k/

rather than dql as he did here:

http://blog.cerowrt.org/post/dql_on_wifi_2/

to the ath9k.

> -Toke
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast <at> lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

--

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
ath9k-devel mailing list
ath9k-devel <at> lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel

Gmane