Adam G Litke | 8 Jul 2002 17:58
Picon
Favicon

gcov+kernprof+lockmeter patch

Hello.  I have been playing around with gcov, kernprof, and lockmeter as of
late and have produced a patch to support all three on the same 2.4.18
kernel.  Now, you may be asking yourself why would I want all three in one
patch?  I know you would never get any useful data by enabling all three at
once.  But, you can compile in the tools you want from the same source.
This helps to prevent headaches as the original kernprof and gcov patches
were 2.4.17 and don't apply cleanly to 2.4.18.

It is mainly for my use, but I thought that someone else might also be
interested in this.  If there is interest, I will produce the same patch
for 2.5.  For now, the 2.4.18 patch is on the lse site.

Let me know what you think.

Thanks,
Adam Litke

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
Shureih, Tariq | 8 Jul 2002 19:40
Picon
Favicon

RE: [normal] - Scalability (Locking primitives - peel off dcache lock ) not integrated

Hanna:

Are you back so we can work on this issue?
Can I give you a call to figure things out?

--
Tariq ¤

-----Original Message-----
From: Hanna Linder [mailto:hannal <at> us.ibm.com] 
Sent: Monday, June 24, 2002 4:16 PM
To: Shureih, Tariq; 'maneesh <at> in.ibm.com'
Cc: 'Hanna Linder'; 'benr <at> us.ibm.com'; 'yoder1 <at> us.ibm.com';
'khoa <at> us.ibm.com'; 'Lisa Perry'; 'lse-tech <at> lists.sourceforge.net'; Whalen,
Mauri; Peddibhotla, Rammohan; Hans-Joachim Tannenberger; gh <at> us.ibm.com;
markv <at> us.ibm.com; abbattis <at> us.ibm.com; Selbak, Rolla N; Villalovos, John L
Subject: RE: [Lse-tech] [normal] - Scalability (Locking primitives - peel
off dcache lock ) not integrated

Tariq,

	Hi. I am going to be out of town until the 30th as well. 
Unless someone else at Intel can integrate this patch it will 
have to wait for a week.

Thanks.

Hanna

--On Monday, June 24, 2002 15:44:22 -0700 "Shureih, Tariq"
(Continue reading)

Shailabh Nagar | 8 Jul 2002 20:44
Picon
Favicon

Re: gcov+kernprof+lockmeter patch


Hi Adam,

Thanks for doing that ! An integrated patch certainly makes life easier.

Have you had a chance to apply kernprof/lockmeter on post-2.5.17 kernels ?
I, for one, would be interested in an integrated patch for 2.5.24.

Regards,
Shailabh Nagar
Enterprise Linux Group, IBM TJ Watson Research Center
(914) 945 2851, T/L 862 2851

                                                                                                                                        
                      Adam G                                                                                                            
                      Litke/Beaverton/IBM <at> IBMUS        To:       lse-tech <at> lists.sourceforge.net                                         
                      Sent by:                         cc:                                                                              
                      lse-tech-admin <at> lists.sour        Subject:  [Lse-tech] gcov+kernprof+lockmeter patch                               
                      ceforge.net                                                                                                       

                                                                                                                                        
                      07/08/2002 11:58 AM                                                                                               

Hello.  I have been playing around with gcov, kernprof, and lockmeter as of
late and have produced a patch to support all three on the same 2.4.18
kernel.  Now, you may be asking yourself why would I want all three in one
patch?  I know you would never get any useful data by enabling all three at
once.  But, you can compile in the tools you want from the same source.
This helps to prevent headaches as the original kernprof and gcov patches
were 2.4.17 and don't apply cleanly to 2.4.18.
(Continue reading)

Shureih, Tariq | 8 Jul 2002 22:23
Picon
Favicon

Scalability -- Remove Kernel Locking Integrated

Into TLT 1.5 build successfully.

 

 

^-^-^-^-^-^-^-^-

Tariq Shureih

Software Engineering & Integration

Intel Corporation -- TSP

(503) 677-6776

tariq.shureih <at> Intel.com

 

Lisa Perry | 8 Jul 2002 23:09
Picon
Favicon

Re: Scalability -- Remove Kernel Locking Integrated


Yahoo!!!  Thanks for the good news.

Lisa Perry
Strategic Relationship Manager
IBM Linux Technology Center

Office: 503.578.3710    (t/l 775)
Cell: 503.810.1279
Fax: 503.578.3228

                                                                                                                                        
                      "Shureih, Tariq"                                                                                                  
                      <tariq.shureih <at> in        To:       Lisa Perry/Beaverton/IBM <at> IBMUS, Ben Rafanello/Austin/IBM <at> IBMUS, Kathy          
                      tel.com>                  Bennett/Austin/IBM <at> IBMUS, Kent E Yoder/Austin/IBM <at> IBMUS, Khoa Huynh/Austin/IBM <at> IBMUS    
                                               cc:       "'lse-tech <at> lists.sourceforge.net'" <lse-tech <at> lists.sourceforge.net>            
                      07/08/2002 01:23         Subject:  Scalability -- Remove Kernel Locking Integrated                                
                      PM                                                                                                                

Into TLT 1.5 build successfully.

^-^-^-^-^-^-^-^-
Tariq Shureih
Software Engineering & Integration
Intel Corporation -- TSP
(503) 677-6776
tariq.shureih <at> Intel.com

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
rwhron | 9 Jul 2002 02:59
Picon
Favicon

pipe and af/unix latency differences between aa and jam on smp

The -jam patchset is interesting because it starts out
with the entire -aa patchset and adds a few things.

Sometimes small differences in LMbench between -jam and -aa are 
just CPU bounces on SMP.  The difference for pipe and af/unix latency
only appears on SMP too, but it is very consistent.  (My k6/2
has small differences between -aa and -jam for pipe and af/unix
latency).

You will know better what could make the difference:

This is the averages:

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel              Pipe    AF/Unix
-----------------  -------  -------
2.4.19-pre10-aa4    33.941   70.216
2.4.19-pre10-jam2    7.877   16.699

These are the individual runs:

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
OS                              Pipe   AF/Unix
-----------------------------  -------  ------
Linux 2.4.19-pre10-aa4         33.999   73.024 
Linux 2.4.19-pre10-aa4         35.829   73.261 
Linux 2.4.19-pre10-aa4         16.710   74.830 
Linux 2.4.19-pre10-aa4         37.221   66.354 
Linux 2.4.19-pre10-aa4         36.259   68.433 
Linux 2.4.19-pre10-aa4         36.429   68.215 
Linux 2.4.19-pre10-aa4         35.379   77.147 
Linux 2.4.19-pre10-aa4         29.300   73.641 
Linux 2.4.19-pre10-aa4         35.798   64.875 
Linux 2.4.19-pre10-aa4         35.691   75.433 
Linux 2.4.19-pre10-aa4         35.372   73.398 
Linux 2.4.19-pre10-aa4         33.516   69.183 
Linux 2.4.19-pre10-aa4         34.986   69.254 
Linux 2.4.19-pre10-aa4         33.743   69.893 
Linux 2.4.19-pre10-aa4         32.679   71.900 
Linux 2.4.19-pre10-aa4         34.131   71.812 
Linux 2.4.19-pre10-aa4         33.444   72.454 
Linux 2.4.19-pre10-aa4         36.531   71.956 
Linux 2.4.19-pre10-aa4         37.838   69.731 
Linux 2.4.19-pre10-aa4         34.359   71.522 
Linux 2.4.19-pre10-aa4         33.286   71.609 
Linux 2.4.19-pre10-aa4         32.361   43.533 
Linux 2.4.19-pre10-aa4         31.716   74.131 
Linux 2.4.19-pre10-aa4         35.218   72.001 
Linux 2.4.19-pre10-aa4         36.709   67.795 

Linux 2.4.19-pre10-jam2        7.9977   14.495 
Linux 2.4.19-pre10-jam2        7.8406   14.044 
Linux 2.4.19-pre10-jam2        7.7899   14.006 
Linux 2.4.19-pre10-jam2        7.8584   13.819 
Linux 2.4.19-pre10-jam2        7.8379   14.453 
Linux 2.4.19-pre10-jam2        7.8781   14.156 
Linux 2.4.19-pre10-jam2        7.8881   14.238 
Linux 2.4.19-pre10-jam2        7.9833   14.168 
Linux 2.4.19-pre10-jam2        7.7772   78.765 
Linux 2.4.19-pre10-jam2        8.0816   13.703 
Linux 2.4.19-pre10-jam2        7.8605   14.042 
Linux 2.4.19-pre10-jam2        7.7982   13.883 
Linux 2.4.19-pre10-jam2        7.6362   14.286 
Linux 2.4.19-pre10-jam2        7.7480   13.989 
Linux 2.4.19-pre10-jam2        7.9262   13.947 
Linux 2.4.19-pre10-jam2        8.0904   14.014 
Linux 2.4.19-pre10-jam2        7.8480   14.310 
Linux 2.4.19-pre10-jam2        7.7982   14.171 
Linux 2.4.19-pre10-jam2        7.9776   14.234 
Linux 2.4.19-pre10-jam2        7.7931   14.125 
Linux 2.4.19-pre10-jam2        7.8553   14.110 
Linux 2.4.19-pre10-jam2        7.7294   14.285 
Linux 2.4.19-pre10-jam2        8.3361   14.131 
Linux 2.4.19-pre10-jam2        7.7797   14.039 
Linux 2.4.19-pre10-jam2        7.8265   14.043 

For pipe and af/unix bandwidth, the difference appears to just be a
CPU bounce here and there.

jam patchsets are at:
http://giga.cps.unizar.es/~magallon/linux/

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
J.A. Magallon | 9 Jul 2002 03:11
Picon

Re: pipe and af/unix latency differences between aa and jam on smp


On 2002.07.09 rwhron <at> earthlink.net wrote:
>The -jam patchset is interesting because it starts out
>with the entire -aa patchset and adds a few things.
>
>Sometimes small differences in LMbench between -jam and -aa are 
>just CPU bounces on SMP.  The difference for pipe and af/unix latency
>only appears on SMP too, but it is very consistent.  (My k6/2
>has small differences between -aa and -jam for pipe and af/unix
>latency).
>
>You will know better what could make the difference:
>
>This is the averages:
>
>*Local* Communication latencies in microseconds - smaller is better
>-------------------------------------------------------------------
>kernel              Pipe    AF/Unix
>-----------------  -------  -------
>2.4.19-pre10-aa4    33.941   70.216
>2.4.19-pre10-jam2    7.877   16.699
>

Candidates in pre10-jam2 could be:

11-irqbalance-B1.bz2
12-smptimers-A0.bz2
13-irqrate-A1.bz2

excluding anything that has nothing to do with pipes or latency.

Could you try latest -rc1-aa2 ? It includes also irqbalance, so it could be
on varable less in the equation.
I dropped smptimers and irqrate because they did not mix very well  with
bproc and O1 scheduler, but I can try to add them again.

I have a rc1-jam2 ready, but the only important change wrt SMP could be the
mem-barrier specific implementation for P3/P4, and your box is an AMD.

??

--

-- 
J.A. Magallon             \   Software is like sex: It's better when it's free
mailto:jamagallon <at> able.es  \                    -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam2, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
J.A. Magallon | 9 Jul 2002 03:25
Picon

Re: pipe and af/unix latency differences between aa and jam on smp


On 2002.07.09 rwhron <at> earthlink.net wrote:
>The -jam patchset is interesting because it starts out
>with the entire -aa patchset and adds a few things.
>
>Sometimes small differences in LMbench between -jam and -aa are 
>just CPU bounces on SMP.  The difference for pipe and af/unix latency
>only appears on SMP too, but it is very consistent.  (My k6/2
>has small differences between -aa and -jam for pipe and af/unix
>latency).
>
>You will know better what could make the difference:
>
>This is the averages:
>
>*Local* Communication latencies in microseconds - smaller is better
>-------------------------------------------------------------------
>kernel              Pipe    AF/Unix
>-----------------  -------  -------
>2.4.19-pre10-aa4    33.941   70.216
>2.4.19-pre10-jam2    7.877   16.699
>

I took a look at your numbers:

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel                          Pipe    AF/Unix    UDP    RPC/UDP    TCP    RPC/TCP  TCPconn
-----------------------------  -------  -------  -------  -------  -------  -------  -------
2.4.19-pre7-jam6                29.513   42.369  58.6165  60.7792  50.2572  82.4976   87.321
2.4.19-pre8-jam2                 7.697   15.274  59.6730  60.8190   55.276  82.1297   89.416
2.4.19-pre8-jam2-nowuos          7.739   14.929  57.9326  60.5497  55.9745  81.8908   90.370

(last line says that wake-up-sync is not responsible...)

Main changes between first two were irqbalance and ide6->ide10.

--

-- 
J.A. Magallon             \   Software is like sex: It's better when it's free
mailto:jamagallon <at> able.es  \                    -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam2, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
Ruth Forester | 9 Jul 2002 07:33
Picon
Favicon

Re: gcov+kernprof+lockmeter patch

Excellent! Thanks for getting this going.

ruth

Ruth Forester
notes: rsf <at> us.ibm.com
linux: lilo <at> us.ibm.com
IBM LTC - Performance Analysis
503-578-4026 (TL: 775-4026)

> 
> Hello.  I have been playing around with gcov, kernprof, and lockmeter as of
> late and have produced a patch to support all three on the same 2.4.18
> kernel.  Now, you may be asking yourself why would I want all three in one
> patch?  I know you would never get any useful data by enabling all three at
> once.  But, you can compile in the tools you want from the same source.
> This helps to prevent headaches as the original kernprof and gcov patches
> were 2.4.17 and don't apply cleanly to 2.4.18.
> 
> It is mainly for my use, but I thought that someone else might also be
> interested in this.  If there is interest, I will produce the same patch
> for 2.5.  For now, the 2.4.18 patch is on the lse site.
> 
> Let me know what you think.
> 
> Thanks,
> Adam Litke
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Oh, it's good to be a geek.
> http://thinkgeek.com/sf
> _______________________________________________
> Lse-tech mailing list
> Lse-tech <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lse-tech
> 

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
Dipankar Sarma | 9 Jul 2002 14:20
Picon

[OLS] RCU overhead measurements

Here is a more detailed description of the rcu overhead measurement
data presented at OLS 2002. We used kernel profile ticks to roughly 
compare the overheads of different implementations of RCU in order 
to see which features in those implementations help and which don't.

For our experiments, we used the rt_rcu-2.5.3-1.patch (lockfree route cache)
along with 3 different RCU infrastructure implementations. We used
a test suggested by Dave Miller that sends a large number of
packets to random destination addresses changing the address after
every 5 packets. The test was run on an 8-cpu PIII Xeon system
with 6GB RAM. It was run under two environments -

1. Increased neighbor table garbage collection threshold thereby
   reducing the number of updates
2. Default GC threshold that results in very large number of
   garbage collection and route cache updates due to random
   destination addresses.

The key implementation features of the 3 RCU infrastructure patches are -

1. rcu_ltimer - Per-CPU queue of RCUs, periodic (10ms) checking for
                CPU going through a quiescent state.
2. rcu_sched  - Per-CPU queue of RCUs, all checking done from scheduler 
                context, global atomic counter of RCUs.
3. rcu_poll   - Global queue of RCUs, single polling tasklet, force
                reschedule on CPUs for completion of RCU grace period.

The patches can be found at http://lse.sf.net/locking/ols2002/rcu/patches.
A description of these as well as some other experimental patches
can be found in http://lse.sf.net/locking/ols2002/rcu/patches/rcu_impl.html.

The raw kernprof outputs are at 
http://lse.sf.net/locking/ols2002/rcu/results/rttest/rttest_davem/.

The numbers represent the profile ticks in the corresponding routines.

1. rt_rcu with neighbor table garbage collection threshold increased to
   prevent frequent overflow (due to random dest addresses).

function        	base    rcu_ltimer      rcu_sched       rcu_poll
--------        	----    ----------      ---------       --------
ip_route_output_key     4486    2026		2162		2135
call_rcu                	11		68		71
rcu_process_callbacks   	4		   		128
rcu_invoke_callbacks            4		   		24
rcu_batch_done		           		1		   
schedule		423	453		459		500
rcu_prepare_polling	   	   		   		13
rcu_polling		   	   		   		134

2. rt_rcu with frequent neighbor table overflow (due to random dest addresses)

function        	base    rcu_ltimer      rcu_sched       rcu_poll
--------        	----    ----------      ---------       --------
ip_route_output_key     2358    1646		1619		1641
call_rcu                	262		1147		2251
rcu_process_callbacks   	49		   		2778
rcu_invoke_callbacks            57		   		488
rcu_check_quiescent_state	27
rcu_check_callbacks		24
rcu_reg_batch			3
rcu_batch_done		           		3		   
schedule		294	   		849		808
rcu_prepare_polling	   	   		   		21
rcu_polling		   	   		   		2756
rcu_completion							4
force_cpu_reschedule						553
__tasklet_hi_schedule						1557

Based on these measurements, we can draw the following conclusions -

1. RCU, implemented the right way, can be beneficial even when
updates are more than rare. Atleast one patch (rcu_ltimer) shows
a net improvement (gain in ip_route_output_key - rcu overhead) under
heavy update load.

2. When updates are rare most implementations can work well and show
real benefits.

3. Global queues are out, it is important to have per-cpu queues as
seen from overheads in rcu_poll code.

4. Global atomic RCU counter in rcu_sched probably hurts as seen
in call_rcu() overhead. So we should try to avoid such global
counters.

Comments/suggestions ?

Thanks
--

-- 
Dipankar Sarma  <dipankar <at> in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf

Gmane