Nick Piggin | 1 Dec 2008 09:33
Picon

[patch][rfc] fs: shrink struct dentry

Hi,
Comments?
Thanks,
Nick

--
struct dentry is one of the most critical structures in the kernel. So it's
sad to see it going neglected.

With CONFIG_PROFILING turned on (which is probably the common case at least
for distros and kernel developers), sizeof(struct dcache) == 208 here
(64-bit). This gives 19 objects per slab.

I packed d_mounted into a hole, and took another 4 bytes off the inline
name length to take the padding out from the end of the structure. This
shinks it to 200 bytes. I could have gone the other way and increased the
length to 40, but I'm aiming for a magic number, read on...

I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
why was this ever a good idea? The cookie system should increase its hash
size or use a tree or something if lookups are a problem. Also the "fast
dcookie lookups" in oprofile should be moved into the dcookie code -- how
can oprofile possibly care about the dcookie_mutex? It gets dropped after
get_dcookie() returns so it can't be providing any sort of protection.

At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
will now require 3 cachelines to touch all fields wheras previously it
would require 4.
(Continue reading)

Andi Kleen | 1 Dec 2008 12:09

Re: [patch][rfc] fs: shrink struct dentry

Nick Piggin <npiggin <at> suse.de> writes:

> Hi,
> Comments?
> Thanks,
> Nick
>
> --
> struct dentry is one of the most critical structures in the kernel. So it's
> sad to see it going neglected.

Very nice. But the sad thing is that such optimizations tend to quickly
bit rot again. At least add big fat comments.

How does it look like on 32bit hosts?

Since the size is variable depending on word size it might be a 
good idea to auto adjust inline name length to always give a nice
end result for slab.

Also I think with some effort it would be possible to shrink it more.
But since you already reached cache lines, it would just allow
to increase inline name length. Ok perhaps it would help more on 32bit.

Further possibilities to shrink: 
- Eliminate name.length. It seems of dubious utility
(in general I'm not sure struct qstr is all that useful)
- Change some of the children/alias list_heads to hlist_heads. I don't
think these lists typically need O(1) access to the end.
- If the maximum mount nest was limited d_mounted could migrate
(Continue reading)

Nick Piggin | 1 Dec 2008 12:26
Picon

Re: [patch][rfc] fs: shrink struct dentry

On Mon, Dec 01, 2008 at 12:09:12PM +0100, Andi Kleen wrote:
> Nick Piggin <npiggin <at> suse.de> writes:
> 
> > Hi,
> > Comments?
> > Thanks,
> > Nick
> >
> > --
> > struct dentry is one of the most critical structures in the kernel. So it's
> > sad to see it going neglected.
> 
> Very nice. But the sad thing is that such optimizations tend to quickly
> bit rot again. At least add big fat comments.

I was tempted to add a "Don't add anything to struct dentry" comment :)

> How does it look like on 32bit hosts?

Actually 32bit does not gain anything from packing d_mounted, but it
does benefit from removing d_cookie, so in that case I just added
another 4 bytes to the inline name length in order to keep it at 128
bytes.

Removing the CONFIG_PROFILING difference is quite nice because the
last thing you want is to try to profile something in the dcache and
find that cache access characteristics change when you enable
oprofile :P

> Since the size is variable depending on word size it might be a 
(Continue reading)

John Levon | 1 Dec 2008 18:51
Favicon

Re: [patch][rfc] fs: shrink struct dentry

On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:

> I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> why was this ever a good idea? The cookie system should increase its hash
> size or use a tree or something if lookups are a problem.

Are you saying you've made this change without even testing its
performance impact?

john

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
Nick Piggin | 1 Dec 2008 19:04
Picon

Re: [patch][rfc] fs: shrink struct dentry

On Mon, Dec 01, 2008 at 05:51:13PM +0000, John Levon wrote:
> On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:
> 
> > I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> > why was this ever a good idea? The cookie system should increase its hash
> > size or use a tree or something if lookups are a problem.
> 
> Are you saying you've made this change without even testing its
> performance impact?

For oprofile case (maybe if you are profiling hundreds of vmas and
overflow the 4096 byte hash table), no. That case is uncommon and
must be fixed in the dcookie code (as I said, trivial with changing
data structure). I don't want this pointer in struct dentry
regardless of a possible tiny benefit for oprofile.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John Levon | 1 Dec 2008 20:38
Favicon

Re: [patch][rfc] fs: shrink struct dentry

On Mon, Dec 01, 2008 at 07:04:55PM +0100, Nick Piggin wrote:

> On Mon, Dec 01, 2008 at 05:51:13PM +0000, John Levon wrote:
> > On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:
> > 
> > > I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> > > why was this ever a good idea? The cookie system should increase its hash
> > > size or use a tree or something if lookups are a problem.
> > 
> > Are you saying you've made this change without even testing its
> > performance impact?
> 
> For oprofile case (maybe if you are profiling hundreds of vmas and
> overflow the 4096 byte hash table), no. That case is uncommon and
> must be fixed in the dcookie code (as I said, trivial with changing
> data structure). I don't want this pointer in struct dentry
> regardless of a possible tiny benefit for oprofile.

Don't you even have a differential profile showing the impact of
removing d_cookie? This hash table lookup will now happen on *every*
userspace sample that's processed. That's, uh, a lot.

(By all means make your change, but I don't get how it's OK to regress
other code, and provide no evidence at all as to its impact.)

john
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

Carl Love | 2 Dec 2008 01:18
Picon
Favicon

[Patch 1/3] User OProfile support for the IBM CELL processor SPU event profiling


This patch adds the SPU event profiling support for the IBM Cell
processor to the list of available events. The opcontrol script
patches include a test to see if there is a new cell specific file
in the kernel oprofile file system.  If the file exists, then the
kernel supports SPU event profiling.  

Signed-off-by: Carl Love <carll <at> us.ibm.com>

Index: oprofile-cvs/events/ppc64/cell-be/events
===================================================================
--- oprofile-cvs.orig/events/ppc64/cell-be/events
+++ oprofile-cvs/events/ppc64/cell-be/events
 <at>  <at>  -108,12 +108,42  <at>  <at>  event:0xdbe counters:0,1,2,3 um:PPU_0_cy
 event:0xdbf counters:0,1,2,3 um:PPU_0_cycles          minimum:10000	name:stb_not_empty		: At least one
store gather buffer not empty.

 # Cell BE Island 4 - Synergistic Processor Unit (SPU)
-
+#
+# OPROFILE FOR CELL ONLY SUPPORTS PROFILING ON ONE SPU EVENT AT A TIME
+#
 # CBE Signal Group 41 - SPU (NClk)
+event:0x1004 counters:0 um:SPU_02_cycles          minimum:10000	name:dual_instrctn_commit	: Dual
instruction committed.
+event:0x1005 counters:0 um:SPU_02_cycles          minimum:10000	name:sngl_instrctn_commit	: Single
instruction committed.
+event:0x1006 counters:0 um:SPU_02_cycles          minimum:10000	name:ppln0_instrctn_commit	: Pipeline 0
instruction committed.
+event:0x1007 counters:0 um:SPU_02_cycles          minimum:10000	name:ppln1_instrctn_commit	: Pipeline 1
(Continue reading)

Carl Love | 2 Dec 2008 01:18
Picon
Favicon

[Patch 0/3] Overview, OProfile SPU event profiling support for IBM Cell processor

This is a rework of the previously posted set of patches.

Patch 1 is the user level patch to add the SPU events to the user
OProfile tool.  

Patch 2 is a kernel patch to do code clean up and restructuring to make
it easier to add the new SPU event profiling support.  This patch makes
no functional changes.

Patch 3 is a kernel patch to add the SPU event profiling support.

               Carl Love

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
Carl Love | 2 Dec 2008 01:18
Picon
Favicon

[Patch 2/3] Kernel patch, IBM CELL processor OProfile cleanup and restructuring


This patch restructures and cleans up the code a bit to make it 
easier to add new functionality later.  The patch makes no 
functional changes to the existing code.

Signed-off-by: Carl Love <carll <at> us.ibm.com>

Index: Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
===================================================================
--- Cell_kernel_11_10_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
 <at>  <at>  -40,14 +40,9  <at>  <at> 
 #include "../platforms/cell/interrupt.h"
 #include "cell/pr_util.h"

-static void cell_global_stop_spu(void);
-
-/*
- * spu_cycle_reset is the number of cycles between samples.
- * This variable is used for SPU profiling and should ONLY be set
- * at the beginning of cell_reg_setup; otherwise, it's read-only.
- */
-static unsigned int spu_cycle_reset;
+#define PPU_PROFILING            0
+#define SPU_PROFILING_CYCLES     1
+#define SPU_PROFILING_EVENTS     2

 #define NUM_SPUS_PER_NODE    8
 #define SPU_CYCLES_EVENT_NUM 2	/*  event number for SPU_CYCLES */
 <at>  <at>  -66,6 +61,14  <at>  <at>  static unsigned int spu_cycle_reset;
(Continue reading)

Carl Love | 2 Dec 2008 01:18
Picon
Favicon

[Patch 3/3] Kernel patch, IBM CELL processor add OProfile SPU event profiling support


This patch adds the SPU event based profiling funcitonality for the
IBM Cell processor.  Previously, the CELL OProfile kernel code supported
PPU event, PPU cycle profiling and SPU cycle profiling.   The addition of
SPU event profiling allows the users to identify where in their SPU code
various SPU evnets are occuring.  This should help users further identify
issues with their code.  Note, SPU profiling has some limitations due to HW
constraints.  Only one event at a time can be used for profiling and SPU event 
profiling must be time sliced across all of the SPUs in a node.  

The patch adds a new arch specific file to the OProfile file system. The
file has bit 0 set to indicate that the kernel supports SPU event profiling.
The user tool must check this file/bit to make sure the kernel supports
SPU event profiling before trying to do SPU event profiling.  The user tool
check is part of the user tool patch for SPU event profiling.

Signed-off-by: Carl Love <carll <at> us.ibm.com>

Index: Cell_kernel_11_10_2008-new-patches/arch/powerpc/oprofile/op_model_cell.c
===================================================================
--- Cell_kernel_11_10_2008-new-patches.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_11_10_2008-new-patches/arch/powerpc/oprofile/op_model_cell.c
 <at>  <at>  -44,6 +44,12  <at>  <at> 
 #define SPU_PROFILING_CYCLES     1
 #define SPU_PROFILING_EVENTS     2

+#define SPU_EVENT_NUM_START      4100
+#define SPU_EVENT_NUM_STOP       4399
+#define SPU_PROFILE_EVENT_ADDR          4363  /* spu, address trace, decimal */
+#define SPU_PROFILE_EVENT_ADDR_MASK_A   0x146 /* sub unit set to zero */
(Continue reading)


Gmane