Mingming Cao | 1 Sep 2007 02:01
Picon
Favicon

Re: [RFC 1/4] Large Blocksize support for Ext2/3/4

On Wed, 2007-08-29 at 17:47 -0700, Mingming Cao wrote:

> Just rebase to 2.6.23-rc4 and against the ext4 patch queue. Compile tested only. 
> 
> Next steps:
> Need a e2fsprogs changes to able test this feature. As mkfs needs to be
> educated not assuming rec_len to be blocksize all the time.
> Will try it with Christoph Lameter's large block patch next.
> 

Two problems were found when testing largeblock on ext3.  Patches to
follow. 

Good news is, with your changes, plus all these extN changes, I am able
to run ext2/3/4 with 64k block size, tested on x86 and ppc64 with 4k
page size. fsx test runs fine for an hour on ext3 with 16k blocksize on
x86:-)

Mingming

Mingming Cao | 1 Sep 2007 02:12
Picon
Favicon

[RFC 1/2] JBD: slab management support for large block(>8k)

>From clameter:
Teach jbd/jbd2 slab management to support >8k block size. Without this, it refused to mount on >8k ext3.

Signed-off-by: Mingming Cao <cmm <at> us.ibm.com>

Index: my2.6/fs/jbd/journal.c
===================================================================
--- my2.6.orig/fs/jbd/journal.c	2007-08-30 18:40:02.000000000 -0700
+++ my2.6/fs/jbd/journal.c	2007-08-31 11:01:18.000000000 -0700
 <at>  <at>  -1627,16 +1627,17  <at>  <at>  void * __jbd_kmalloc (const char *where,
  * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
  * and allocate frozen and commit buffers from these slabs.
  *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the corresponding kmalloc
+ * cache and create an alias. --clameter)
  */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
+#define JBD_MAX_SLABS 7
+#define JBD_SLAB_INDEX(size)  get_order((size) << (PAGE_SHIFT - 10))

 static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
 static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-	"jbd_1k", "jbd_2k", "jbd_4k", NULL, "jbd_8k"
+	"jbd_1k", "jbd_2k", "jbd_4k", "jbd_8k",
+	"jbd_16k", "jbd_32k", "jbd_64k"
(Continue reading)

Mingming Cao | 1 Sep 2007 02:12
Picon
Favicon

[RFC 2/2] JBD: blocks reservation fix for large block support

The blocks per page could be less or quals to 1 with the large block support in VM.
The patch fixed the way to calculate the number of blocks to reserve in journal in the
case blocksize > pagesize.

Signed-off-by: Mingming Cao <cmm <at> us.ibm.com>

Index: my2.6/fs/jbd/journal.c
===================================================================
--- my2.6.orig/fs/jbd/journal.c	2007-08-31 13:27:16.000000000 -0700
+++ my2.6/fs/jbd/journal.c	2007-08-31 13:28:18.000000000 -0700
 <at>  <at>  -1611,7 +1611,12  <at>  <at>  void journal_ack_err(journal_t *journal)

 int journal_blocks_per_page(struct inode *inode)
 {
-	return 1 << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+	int bits = PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+	if (bits > 0)
+		return 1 << bits;
+	else
+		return 1;
 }

 /*
Index: my2.6/fs/jbd2/journal.c
===================================================================
--- my2.6.orig/fs/jbd2/journal.c	2007-08-31 13:32:21.000000000 -0700
+++ my2.6/fs/jbd2/journal.c	2007-08-31 13:32:30.000000000 -0700
 <at>  <at>  -1612,7 +1612,12  <at>  <at>  void jbd2_journal_ack_err(journal_t *jou

(Continue reading)

Christoph Lameter | 1 Sep 2007 03:11
Picon
Favicon

Re: [00/36] Large Blocksize Support V6

Thanks to some help Mingming Cao we now have support for extX with up to 
64k blocksize. There were several issues in the jbd layer.... (The ext2 
patch that Christoph complained about was dropped).

The patchset can be tested (assuming one has a current git tree)

git checkout -b largeblock
git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/largeblocksize.git largeblock

... Fiddle around with large blocksize functionality....

git checkout master

... Back to Linus' tree.

git branch -D largeblock

... Get rid of it.

commit ed541c23b8e71a0217fd96d1b421992fdd7519df
Author: Mingming Cao <cmm <at> us.ibm.com>

    JBD: blocks reservation fix for large block support

commit a1eaa33cf1600f18e961f1cf5c87820bca44df08
Author: Christoph Lameter <clameter <at> sgi.com>

    Teach jbd/jbd2 slab management to support >8k block size.

commit 8199976e04333d66202edcaec6cef46771ed194e
(Continue reading)

Christoph Lameter | 1 Sep 2007 03:41
Picon
Favicon

[RFC 02/26] SLUB: Move count_partial()

Move the counting function for objects in partial slabs so that it is placed
before kmem_cache_shrink. We will need to use it to establish the
fragmentation ratio of per node slab lists.

Signed-off-by: Christoph Lameter <clameter <at> sgi.com>
---
 mm/slub.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 45c76fe..aad6f83 100644
--- a/mm/slub.c
+++ b/mm/slub.c
 <at>  <at>  -2595,6 +2595,19  <at>  <at>  void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);

+static unsigned long count_partial(struct kmem_cache_node *n)
+{
+	unsigned long flags;
+	unsigned long x = 0;
+	struct page *page;
+
+	spin_lock_irqsave(&n->list_lock, flags);
+	list_for_each_entry(page, &n->partial, lru)
+		x += page->inuse;
+	spin_unlock_irqrestore(&n->list_lock, flags);
+	return x;
+}
+
(Continue reading)

Christoph Lameter | 1 Sep 2007 03:41
Picon
Favicon

[RFC 00/26] Slab defragmentation V5

Slab defragmentation is mainly an issue if Linux is used as a fileserver
and large amounts of dentries, inodes and buffer heads accumulate. In some
load situations the slabs become very sparsely populated so that a lot of
memory is wasted by slabs that only contain one or a few objects. In
extreme cases the performance of a machine will become sluggish since
we are continually running reclaim. Slab defragmentation adds the
capability to recover wasted memory.

For lumpy reclaim slab defragmentation can be used to enhance the
ability to recover larger contiguous areas of memory. Lumpy reclaim currently
cannot do anything if a slab page is encountered. With slab defragmentation
that slab page can be removed and a large contiguous page freed. It may
be possible to have slab pages also part of ZONE_MOVABLE (Mel's defrag
scheme in 2.6.23) or the MOVABLE areas (antifrag patches in mm).

The trouble with this patchset is that it is difficult to validate.
Activities are only performed when special load situations are encountered.
Are there any tests that could give meaningful information about
the effectiveness of these measures? I have run various tests here
creating and deleting files and building kernels under low memory situations
to trigger these reclaim mechanisms but how does one measure their
effectiveness?

The patchset is also available via git

git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git defrag

We currently support the following types of reclaim:

1. dentry cache
(Continue reading)

Christoph Lameter | 1 Sep 2007 03:41
Picon
Favicon

[RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000920 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001344 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namespace /sys/slab/Acpi-Operand /sys/slab/Acpi-Pa rse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/slab/RAW /sys/slab/TCP /sys/slab/UDP /sys/sl

Create an ops field in /sys/slab/*/ops to contain all the operations defined
on a slab. This will be used to display the additional operations that we
will define soon.

Signed-off-by: Christoph Lameter <clameter <at> sgi.com>
---
 mm/slub.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f95a760..fc2f1e3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
 <at>  <at>  -3501,16 +3501,18  <at>  <at>  static ssize_t order_show(struct kmem_cache *s, char *buf)
 }
 SLAB_ATTR_RO(order);

-static ssize_t ctor_show(struct kmem_cache *s, char *buf)
+static ssize_t ops_show(struct kmem_cache *s, char *buf)
 {
-	if (s->ctor) {
-		int n = sprint_symbol(buf, (unsigned long)s->ctor);
+	int x = 0;

-		return n + sprintf(buf + n, "\n");
+	if (s->ctor) {
+		x += sprintf(buf + x, "ctor : ");
+		x += sprint_symbol(buf + x, (unsigned long)s->ops->ctor);
+		x += sprintf(buf + x, "\n");
 	}
(Continue reading)

Christoph Lameter | 1 Sep 2007 03:41
Picon
Favicon

[RFC 06/26] SLUB: Add get() and kick() methods

Add the two methods needed for defragmentation and add the display of the
methods via the proc interface.

Add documentation explaining the use of these methods.

Signed-off-by: Christoph Lameter <clameter <at> sgi.com>
---
 include/linux/slab.h     |    3 +++
 include/linux/slub_def.h |   32 ++++++++++++++++++++++++++++++++
 mm/slub.c                |   32 ++++++++++++++++++++++++++++++--
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index d859354..848e9a7 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
 <at>  <at>  -54,6 +54,9  <at>  <at>  struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
 			void (*)(void *, struct kmem_cache *, unsigned long));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
+void kmem_cache_setup_defrag(struct kmem_cache *s,
+	void *(*get)(struct kmem_cache *, int nr, void **),
+	void (*kick)(struct kmem_cache *, int nr, void **, void *private));
 void kmem_cache_free(struct kmem_cache *, void *);
 unsigned int kmem_cache_size(struct kmem_cache *);
 const char *kmem_cache_name(struct kmem_cache *);
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 291881d..69c32a7 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
(Continue reading)

Christoph Lameter | 1 Sep 2007 03:41
Picon
Favicon

[RFC 01/26] SLUB: Extend slabinfo to support -D and -C options

-D lists caches that support defragmentation

-C lists caches that use a ctor.

Change field names for defrag_ratio and remote_node_defrag_ratio.

Add determination of the allocation ratio for slab. The allocation ratio
is the percentage of available slots for objects in use.

Signed-off-by: Christoph Lameter <clameter <at> sgi.com>
---
 Documentation/vm/slabinfo.c |   52 ++++++++++++++++++++++++++++++++++++------
 1 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c
index 1af7bd5..1319756 100644
--- a/Documentation/vm/slabinfo.c
+++ b/Documentation/vm/slabinfo.c
 <at>  <at>  -30,6 +30,8  <at>  <at>  struct slabinfo {
 	int hwcache_align, object_size, objs_per_slab;
 	int sanity_checks, slab_size, store_user, trace;
 	int order, poison, reclaim_account, red_zone;
+	int defrag, ctor;
+	int defrag_ratio, remote_node_defrag_ratio;
 	unsigned long partial, objects, slabs;
 	int numa[MAX_NODES];
 	int numa_partial[MAX_NODES];
 <at>  <at>  -56,6 +58,8  <at>  <at>  int show_slab = 0;
 int skip_zero = 1;
 int show_numa = 0;
(Continue reading)

Christoph Lameter | 1 Sep 2007 03:41
Picon
Favicon

[RFC 24/26] dentries: Add constructor

In order to support defragmentation on the dentry cache we need to have
an determined object state at all times. Without a destructor the object
would have a random state after allocation.

So provide a constructor.

Signed-off-by: Christoph Lameter <clameter <at> sgi.com>
---
 fs/dcache.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 71e4877..282a467 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
 <at>  <at>  -874,6 +874,16  <at>  <at>  static struct shrinker dcache_shrinker = {
 	.seeks = DEFAULT_SEEKS,
 };

+void dcache_ctor(void *p, struct kmem_cache *s, unsigned long flags)
+{
+	struct dentry *dentry = p;
+
+	spin_lock_init(&dentry->d_lock);
+	dentry->d_inode = NULL;
+	INIT_LIST_HEAD(&dentry->d_lru);
+	INIT_LIST_HEAD(&dentry->d_alias);
+}
+
 /**
(Continue reading)


Gmane