Paul E. McKenney | 1 May 2010 01:47
Picon

Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

On Fri, Apr 30, 2010 at 12:07:49PM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck <at> linux.vnet.ibm.com> wrote:
> 
> > Hello!
> > 
> > This patchset contains four RCU lockdep splat fixes, courtesy of David 
> > Howells, Peter Zijlstra, and Trond Myklebust, [...]
> 
> I've applied #1 and #2 - but shouldnt #4 and #5 go via the NFS tree?

Good point -- I will forward them on to Trond.

							Thanx, Paul
Kent Overstreet | 1 May 2010 02:12
Picon

[PATCH 0/3] Bcache: version 4

I've got some documentation incorporated since the last posting. The
user documentation should be sufficient; the code could probably use
more but it's hard for me to say what, so I'll try and add whatever
people find unclear.

Most of the basic functionality is now there; the most visible thing is
it's now correctly saving all the metadata, so you can unload a cache
and then reload it, and everything will still be there. I plan on
having read/write in the next version; barring the unexpected version 5
should be good enough for people to start playing with.

The performance issues I was seeing that I posted about in the last
version completely vanished when I tested it outside of kvm - there was
no visible overhead. I don't know what's going on with kvm, it must be
triggering a pathalogical corner case somewhere - performance varies
wildly for no good reason. Unfortunately, I don't have the hardware to
do any real performance testing, but from what I've seen so far it's
plenty fast.

Program to make a cache device is attached; the rest is split out more
or less by function. There's more comments along with the hooks patch.

#define _XOPEN_SOURCE 500

#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
(Continue reading)

Kent Overstreet | 1 May 2010 02:12
Picon

[PATCH 1/3] Bcache: version 4

 Documentation/bcache.txt |   56 ++++++++++++++++++++++++++++++++++++++++++++++
 block/Kconfig            |   15 ++++++++++++
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt
new file mode 100644
index 0000000..dee1514
--- /dev/null
+++ b/Documentation/bcache.txt
 <at>  <at>  -0,0 +1,56  <at>  <at> 
+Say you've got a big slow raid 6, and an X-25E or three. Wouldn't it be
+nice if you could use them as cache... Hence bcache.
+
+It's designed around the performance characteristics of SSDs - it only allocates
+in erase block sized buckets, and it uses a bare minimum btree to track cached
+extants (which can be anywhere from a single sector to the bucket size). It's
+also designed to be very lazy, and use garbage collection to clean stale
+pointers.
+
+Cache devices are used as a pool, and hold data for all the devices that are
+being cached. The cache devices store the UUIDs of devices they have, allowing
+caches to safely be persistent across reboots.
+
+Caching can be transparently enabled and disabled for devices while they are in
+use. All configuration is done via sysfs. To use our SSD sde to cache our
+raid md1:
+
+  make-bcache /dev/sdc
+  echo "/dev/sdc" > /sys/kernel/bcache/register_cache
+  echo "<UUID> /dev/md1 " > /sys/kernel/bcache/register_dev
(Continue reading)

Kent Overstreet | 1 May 2010 02:13
Picon

[PATCH 2/3] Bcache: version 4

In order to prevent a use/free race, bcache needs to know either when a
read has been queued or when it's been finished. Since we're called from
__generic_make_request, the recursion-to-iteration trick prevents us
from doing the former. But cache hits do no allocation in the fast path;
to decrement a bucket's reference count when the bio's been completed,
we'd have to save and replace bi_end_io, which means we'd be forced to
do an allocation per bio processed - greatly annoying.

The technically least bad solution I could come up with was to subvert
generic_make_request; I call __generic_make_request directly, and when
that returns a read's on the request queue, and it's my understanding
discard bios won't get reordered so we're in the clear.

I do feel rather dirty for writing it, but it's the best I've come up
with.  Stack usage obviously could be an issue, and right now it is -
but that's fixable, I just haven't yet decided what I'm going to do with
the one struct I am putting on the stack just yet. But the additional
stack usage should be only a couple pointers total, once I'm done.

The other callback I put in __generic_make_request seems to me something
that would be useful to make generic eventually - I can think of a few
things that'd be really useful to have and would need a callback right
there. But I'm of the opinion that that should wait until other users
exist.

The other main thing I did was implemented a generic mechanism for
completion of split bios; it seemed to me that getting that right (in
particular error handling) is subtle enough that there really ought to
be a clear mechanism.  It guarantees that however many times a bio is
split the completion callback is only called once, and if there was an
(Continue reading)

Kent Overstreet | 1 May 2010 02:13
Picon

[PATCH 3/3] Bcache: version 4

 block/Makefile |    2 +
 block/bcache.c | 2624 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 2626 insertions(+), 0 deletions(-)

diff --git a/block/Makefile b/block/Makefile
index cb2d515..e9b5fc0 100644
--- a/block/Makefile
+++ b/block/Makefile
 <at>  <at>  -15,3 +15,5  <at>  <at>  obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY)	+= blk-integrity.o
+
+obj-$(CONFIG_BLK_CACHE)		+= bcache.o
diff --git a/block/bcache.c b/block/bcache.c
new file mode 100644
index 0000000..0f26277
--- /dev/null
+++ b/block/bcache.c
 <at>  <at>  -0,0 +1,2624  <at>  <at> 
+/*
+ * Copyright (C) 2010 Kent Overstreet <kent.overstreet <at> gmail.com>
+ *
+ * Uses a block device as cache for other block devices; optimized for SSDs.
+ * All allocation is done in buckets, which should match the erase block size
+ * of the device.
+ *
+ * Buckets containing cached data are kept on a heap sorted by priority;
+ * bucket priority is increased on cache hit, and periodically all the buckets
+ * on the heap have their priority scaled down. This currently is just used as
(Continue reading)

Paul E. McKenney | 1 May 2010 02:25
Picon

[PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats

Hello!

This patchset contains ten fixes for various lockdep splats.  The first
two sets are repostings/revisions, the rest new.  The new patches have
all been posted to LKML, but this is the first time for inclusion.

o	rcu: v2: optionally leave lockdep enabled after RCU lockdep splat
	This is a repost that makes the one-splat-per-boot the default,
	but allows those who want multiple splats to get this behavior
	via a new CONFIG_PROVE_RCU_REPEATEDLY configuration parameter.
	(Original from Lai Jiangshan.)

o	KEYS: Fix an RCU warning
	KEYS: Fix an RCU warning in the reading of user keys
	Fixes for RCU-lockdep splats from David Howells for
	security/keys.  Repost of http://lkml.org/lkml/2010/4/22/411.

o	cgroup: Fix an RCU warning in cgroup_path()
	cgroup: Fix an RCU warning in alloc_css_id()
	sched: Fix an RCU warning in print_task()
	cgroup: Check task_lock in task_subsys_state()
	Fixes for new RCU-lockdep splats in cgroups and sched from
	Li Zefan.

o	memcg: css_id() must be called under rcu_read_lock()
	Fixes for new RCU-lockdep splats in memcg from Kamazawa Hiroyuki.

o	blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
	Fix for new RCU-lockdep splat in I/O scheduler from Vivek Goyal.

(Continue reading)

Paul E. McKenney | 1 May 2010 02:26
Picon

[PATCH tip/core/urgent 04/10] cgroup: Fix an RCU warning in cgroup_path()

From: Li Zefan <lizf <at> cn.fujitsu.com>

with CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o debug xxx /mnt
  # cat /proc/$$/cgroup

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

This is a false-positive, because cgroup_path() can be called
with either rcu_read_lock() held or cgroup_mutex held.

Signed-off-by: Li Zefan <lizf <at> cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck <at> linux.vnet.ibm.com>
---
 kernel/cgroup.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e2769e1..4ca928d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
 <at>  <at>  -1646,7 +1646,9  <at>  <at>  static inline struct cftype *__d_cft(struct dentry *dentry)
 int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 {
 	char *start;
-	struct dentry *dentry = rcu_dereference(cgrp->dentry);
+	struct dentry *dentry = rcu_dereference_check(cgrp->dentry,
(Continue reading)

Paul E. McKenney | 1 May 2010 02:26
Picon

[PATCH tip/core/urgent 02/10] KEYS: Fix an RCU warning

From: David Howells <dhowells <at> redhat.com>

Fix the following RCU warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/request_key.c:116 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl/5372:
 #0:  (key_types_sem){.+.+.+}, at: [<ffffffff811a4e3d>] key_type_lookup+0x1c/0x70

stack backtrace:
Pid: 5372, comm: keyctl Not tainted 2.6.34-rc3-cachefs #150
Call Trace:
 [<ffffffff810515f8>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff811a9220>] call_sbin_request_key+0x156/0x2b6
 [<ffffffff811a4c66>] ? __key_instantiate_and_link+0xb1/0xdc
 [<ffffffff811a4cd3>] ? key_instantiate_and_link+0x42/0x5f
 [<ffffffff811a96b8>] ? request_key_auth_new+0x17b/0x1f3
 [<ffffffff811a8e00>] ? request_key_and_link+0x271/0x400
 [<ffffffff810aba6f>] ? kmem_cache_alloc+0xe1/0x118
 [<ffffffff811a8f1a>] request_key_and_link+0x38b/0x400
 [<ffffffff811a7b72>] sys_request_key+0xf7/0x14a
 [<ffffffff81052227>] ? trace_hardirqs_on_caller+0x10c/0x130
 [<ffffffff81393f5c>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
(Continue reading)

Paul E. McKenney | 1 May 2010 02:26
Picon

[PATCH tip/core/urgent 03/10] KEYS: Fix an RCU warning in the reading of user keys

From: David Howells <dhowells <at> redhat.com>

Fix an RCU warning in the reading of user keys:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/user_defined.c:202 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl/3637:
 #0:  (&key->sem){+++++.}, at: [<ffffffff811a80ae>] keyctl_read_key+0x9c/0xcf

stack backtrace:
Pid: 3637, comm: keyctl Not tainted 2.6.34-rc5-cachefs #18
Call Trace:
 [<ffffffff81051f6c>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff811aa55f>] user_read+0x47/0x91
 [<ffffffff811a80be>] keyctl_read_key+0xac/0xcf
 [<ffffffff811a8a06>] sys_keyctl+0x75/0xb7
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells <at> redhat.com>
Signed-off-by: Paul E. McKenney <paulmck <at> linux.vnet.ibm.com>
---
 security/keys/user_defined.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

(Continue reading)

Paul E. McKenney | 1 May 2010 02:26
Picon

[PATCH tip/core/urgent 05/10] cgroup: Fix an RCU warning in alloc_css_id()

From: Li Zefan <lizf <at> cn.fujitsu.com>

With CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o memory xxx /mnt
  # mkdir /mnt/0

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

This is a false-positive. It's safe to directly access parent_css->id.

Signed-off-by: Li Zefan <lizf <at> cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck <at> linux.vnet.ibm.com>
---
 kernel/cgroup.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4ca928d..3a53c77 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
 <at>  <at>  -4561,13 +4561,13  <at>  <at>  static int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *parent,
 {
 	int subsys_id, i, depth = 0;
 	struct cgroup_subsys_state *parent_css, *child_css;
-	struct css_id *child_id, *parent_id = NULL;
+	struct css_id *child_id, *parent_id;

(Continue reading)


Gmane