Peter Pan | 1 Jul 2010 03:56
Picon

Re: [PATCH] ext2:delete misused ext2_free_blocks

On 07/01/2010 06:57 AM, Dan Carpenter wrote:
> On Wed, Jun 30, 2010 at 06:21:27PM +0800, Peter Pan wrote:
>    
>> if ext2_new_blocks returns error, no blocks need to be freed.
>>
>>      
> Hi Peter,
>
> Your patch isn't right.  The original code is OK as is.
>
> Are you seeing a kernel panic?  Perhaps we can help you fix it.
>    
Oh, I know that the original code is good now.
I didn't see a kernel panic, thank you for your guide.
> regards,
> dan carpenter
>
>    
>> Signed-off-by: Peter Pan<wppan <at> redflag-linux.com>
>> ---
>>   fs/ext2/inode.c |    4 ++--
>>   1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
>> index 3675088..f858847 100644
>> --- a/fs/ext2/inode.c
>> +++ b/fs/ext2/inode.c
>>  <at>  <at>  -385,6 +385,8  <at>  <at>  static int ext2_alloc_blocks(struct inode *inode,
>>   	ext2_fsblk_t current_block = 0;
>>   	int ret = 0;
(Continue reading)

Dave Chinner | 1 Jul 2010 05:33

Re: [patch 52/52] fs: icache less I_FREEING time

On Wed, Jun 30, 2010 at 10:14:52PM +1000, Nick Piggin wrote:
> On Wed, Jun 30, 2010 at 08:13:54PM +1000, Dave Chinner wrote:
> > On Thu, Jun 24, 2010 at 01:03:04PM +1000, npiggin <at> suse.de wrote:
> > > Problem with inode reclaim is that it puts inodes into I_FREEING state
> > > and then continues to gather more, during which it may iput,
> > > invalidate_mapping_pages, be preempted, etc. Holding these inodes in
> > > I_FREEING can cause pauses.
> > 
> > What sort of pauses? I can't see how holding a few inodes in
> > I_FREEING state would cause any serious sort of holdoff...
> 
> Well if the inode is accessed again, it has to wait for potentially
> hundreds of inodes to be found from the LRU, pagecache invalidated,
> and destroyed.

So it's a theoretical concern you have, not something that's
actually been demonstrated as a problem?

As it is, If the inode is accessed immediately after teardown has
started, then we failed to hold on to the inode at a higher level
for long enough. Changing the I_FREEING behaviour is trying to
address the issue at the wrong level...

Cheers,

Dave.
--

-- 
Dave Chinner
david <at> fromorbit.com
--
(Continue reading)

Nick Piggin | 1 Jul 2010 10:00
Picon

Re: [patch 44/52] fs: icache per-CPU sb inode lists and locks

On Thu, Jul 01, 2010 at 01:12:00PM +1000, Dave Chinner wrote:
> On Wed, Jun 30, 2010 at 10:08:50PM +1000, Nick Piggin wrote:
> > > I can't say that I'm a great fan of hiding loop context in defines
> > > like this. It reminds far too much of how parts of Irix slowly
> > > ossified because they ended up mess of complex, fragile macros that
> > > nobody fully understood...
> > 
> > It's not perfect. I think it is a lot better than open coding
> > (which I tried before) because that really muddies up the intention
> > of the loop body.
> 
> Something like this doesn't seem particularly bad:
> 
> static inline struct list_head *
> inode_get_sb_list(struct super_block *sb, int *i)
> {
> 	int cpu;
> 
> 	cpu = cpumask_next(i, cpu_possible_mask);
> 	if (cpu >= nr_cpu_ids)
> 		return NULL;
> 	*i = cpu;
> #ifdef CONFIG_SMP
> 	return per_cpu_ptr(sb->s_inodes, cpu);
> #else
> 	return &sb->s_inodes;
> #endif
> }
> 
> and:
(Continue reading)

Nick Piggin | 1 Jul 2010 10:20
Picon

Re: [patch 00/52] vfs scalability patches updated

On Thu, Jul 01, 2010 at 01:56:57PM +1000, Dave Chinner wrote:
> On Wed, Jun 30, 2010 at 10:40:49PM +1000, Nick Piggin wrote:
> > On Wed, Jun 30, 2010 at 09:30:54PM +1000, Dave Chinner wrote:
> > > On Thu, Jun 24, 2010 at 01:02:12PM +1000, npiggin <at> suse.de wrote:
> > > > Performance:
> > > > Last time I was testing on a 32-node Altix which could be considered as not a
> > > > sweet-spot for Linux performance target (ie. improvements there may not justify
> > > > complexity). So recently I've been testing with a tightly interconnected
> > > > 4-socket Nehalem (4s/32c/64t). Linux needs to perform well on this size of
> > > > system.
> > > 
> > > Sure, but I have to question how much of this is actually necessary?
> > > A lot of it looks like scalability for scalabilities sake, not
> > > because there is a demonstrated need...
> > 
> > People are complaining about vfs scalability already (at least Intel,
> > Google, IBM, and networking people). By the time people start shouting,
> > it's too late because it will take years to get the patches merged. I'm
> > not counting -rt people who have a bad time with global vfs locks.
> 
> I'm not denying it that we need to do work here - I'm questioning
> the "change everything at once" approach this patch set takes.
> You've started from the assumption that everything the dcache_lock
> and inode_lock protect are a problem and goes from there.
> 
> However, if we move some things out fom under the dcache lock, then
> the pressure on the lock goes down and the remaining operations may
> not hinder scalability. That's what I'm trying to understand, and
> why I'm suggesting that you need to break this down into smaller,
> more easily verifable, benchamrked patch sets. IMO, I have no way of
(Continue reading)

Andi Kleen | 1 Jul 2010 19:28

Re: [patch 00/52] vfs scalability patches updated

Nick Piggin <npiggin <at> suse.de> writes:
>
> What's that good for? A single threaded, cached `git diff` on the linux
> kernel tree takes just 81% of the time after the vfs patches (0.27s vs
> 0.33s).

That's very cool!

Hopefully we can make some progress on the whole patchkit now.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linus Torvalds | 1 Jul 2010 19:35
Gravatar

Re: [patch 00/52] vfs scalability patches updated

On Wed, Jun 30, 2010 at 5:40 AM, Nick Piggin <npiggin <at> suse.de> wrote:
>>
>> That's a pretty big ouch. Why does RCU freeing of inodes cause that
>> much regression? The RCU freeing is out of line, so where does the big
>> impact come from?
>
> That comes mostly from inability to reuse the cache-hot inode structure,
> and the cost to go over the deferred RCU list and free them after they
> get cache cold.

I do wonder if this isn't a big design bug.

Most of the time with RCU, we don't need to wait to actually do the
_freeing_ of the individual data structure, we only need to make sure
that the data structure remains of the same _type_. IOW, we can free
it (and re-use it), but the backing storage cannot be released to the
page cache. That's what SLAB_DESTROY_BY_RCU should give us.

Is that not possible in this situation? Do we really need to keep the
inode _identity_ around for RCU?

If you use just SLAB_DESTROY_BY_RCU, then inode re-use remains, and
cache behavior would be much improved. The usual requirement for
SLAB_DESTROY_BY_RCU is that you only touch a lock (and perhaps
re-validate the identity) in the RCU-reader paths. Could that be made
to work?

Because that 27% drop really is pretty distressing.

That said, open (of the non-creating kind), close, and stat are
(Continue reading)

Andi Kleen | 1 Jul 2010 19:36

Re: [patch 00/52] vfs scalability patches updated

Dave Chinner <david <at> fromorbit.com> writes:
>
> I'm not denying it that we need to do work here - I'm questioning
> the "change everything at once" approach this patch set takes.
> You've started from the assumption that everything the dcache_lock
> and inode_lock protect are a problem and goes from there.

Global code locks in a core subsystem are definitely a problem.

In many ways they're as bad a a BKL. There will be always
workloads where they hurt. They are bad coding style.
They just have to go.

I don't understand how anyone can even defend them.

Especially bad are code locks that protect lots of different
things. Those are not only bad for scalability, but also 
bad for maintainability, because few people can really
understand them even. With smaller well defined locks
that's usually easier.

-Andi

--

-- 
ak <at> linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Al Viro | 1 Jul 2010 20:56
Picon

[WTFoTW] ->quota_on() deadlocks

	All quotactl callbacks are done with s_umount held shared.
Fine, but ->quota_on() will do kern_path() and _that_ can try to
grab the same thing exclusive - suppose we pass a pathname that
walks into autofs and triggers mounting of the same fs (at a different
mountpoint, that is).  That'll end up calling sget(), finding our
superblock and trying to grab s_umount on it.  mount(8) sits
uninterruptibly sleeping in mount(2), kern_path() waits for it
to complete and that's not going to happen until the caller of
kern_path() (do_quotactl(), ultimately) finishes.

	Obvious solution is b0rken - we _can't_ take the call of
kern_path() to a point prior to getting (and locking) the superblock.
Why?  Because ocfs2 ignores the pathname argument, so failing on
bogus pathnames will blow the userland API compatibility.

	Other alternatives are also not particulary pleasant since
we need s_umount at some point there - we want some exclusion with
remounting.

	Ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mimi Zohar | 1 Jul 2010 21:07
Picon

[PATCH] security: move LSM xattrnames to xattr.h

Make the security extended attributes names global. Updated to move
the remaining Smack xattrs.

Signed-off-by: Mimi Zohar <zohar <at> us.ibm.com>
Acked-by: Serge Hallyn <serue <at> us.ibm.com>
---
 include/linux/capability.h |    3 ---
 include/linux/xattr.h      |   14 ++++++++++++++
 security/selinux/hooks.c   |    3 ---
 security/smack/smack.h     |   10 ----------
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 39e5ff5..90012b9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
 <at>  <at>  -49,9 +49,6  <at>  <at>  typedef struct __user_cap_data_struct {
 } __user *cap_user_data_t;

 
-#define XATTR_CAPS_SUFFIX "capability"
-#define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX
-
 #define VFS_CAP_REVISION_MASK	0xFF000000
 #define VFS_CAP_REVISION_SHIFT	24
 #define VFS_CAP_FLAGS_MASK	~VFS_CAP_REVISION_MASK
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index 0cfa1e9..f1e5bde 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
(Continue reading)

Neil Brown | 1 Jul 2010 22:41
Picon
Gravatar

Re: [PATCH -V14 0/11] Generic name to handle and open by handle syscalls

On Thu, 01 Jul 2010 21:58:54 +0530
"Aneesh Kumar K. V" <aneesh.kumar <at> linux.vnet.ibm.com> wrote:

> On Tue, 15 Jun 2010 22:42:50 +0530, "Aneesh Kumar K.V" <aneesh.kumar <at> linux.vnet.ibm.com> wrote:
> 
> Hi Al,
> 
> Any chance of getting this reviewed/merged in the next merge window ?

 My own opinion of the patchset is that the code itself is fine,
 however there is one part of the interface that bothers me.

 I think that it is a little ugly that filesystem uuid extraction is so
 closely tied to filehandle manipulation.  They are certainly related, and we
 certainly need to be able to get the filesystem uuid directly from the
 filesystem, but given that filehandle -> fd mapping doesn't (and shouldn't)
 use the uuid, the fact that fd/name -> filehandle mapping does return the
 uuid looks like it is simply piggy backing some functionality on the side,
 rather than creating a properly designed and general interface.

 I would feel happier about the patches if you removed all reference to uuids
 and then found some other way to ask a filesystem what its uuid was.

 This is not an issue that would make be want to stop the patches going
 upstream, but it does hold me back from offering a reviewed-by or
 acked-by (for whatever they might be worth).

NeilBrown

--
(Continue reading)


Gmane