Hugh Dickins | 1 Dec 02:18 2008

Re: [patch 1/2] mm: pagecache allocation gfp fixes

On Fri, 28 Nov 2008, Nick Piggin wrote:
> On Thu, Nov 27, 2008 at 06:14:18PM +0000, Hugh Dickins wrote:
> > On Thu, 27 Nov 2008, Nick Piggin wrote:
> > > On Thu, Nov 27, 2008 at 11:52:40AM +0200, Pekka Enberg wrote:
> > > > > -               err = add_to_page_cache_lru(page, mapping, index, gfp_mask);
> > > > > +               err = add_to_page_cache_lru(page, mapping, index,
> > > > > +                       (gfp_mask & (__GFP_FS|__GFP_IO|__GFP_WAIT|__GFP_HIGH)));
> > > > 
> > > > Can we use GFP_RECLAIM_MASK here? I mean, surely we need to pass
> > > > __GFP_NOFAIL, for example, down to radix_tree_preload() et al?
> > 
> > I certainly agree with Pekka's suggestion to use GFP_RECLAIM_MASK.
> > 
> > > 
> > > Updated patch.
> > 
> > I'm not sure about it.  I came here before 2.6.25, not yet got around
> > to submitting, I went in the opposite direction.  What drove me was an
> > irritation at the growing number of & ~__GFP_HIGHMEMs (after adding a
> > couple myself in shmem.c).  At the least, I think we ought to change
> > those to & GFP_RECLAIM_MASKs (it seems we don't have one, but can
> > imagine a block driver that wants GFP_DMA or GFP_DMA32 for pagecache:
> > there's no reason to alloc its kernel-internal data structures for DMA).
> > 
> > My patch went the opposite direction to yours, in that I was pushing
> > down the GFP_RECLAIM_MASKing into lib/radix-tree.c and mm/memcontrol.c
> > (but that now doesn't kmalloc for itself, so no longer needs the mask).
> > 
> > I'm not sure which way is the right way: you can say I'm inconsistent
> > not to push it down into slab/sleb/slib/slob/slub itself, but I've a
(Continue reading)

KAMEZAWA Hiroyuki | 1 Dec 02:50 2008

Re: [patch 1/2] mm: pagecache allocation gfp fixes

On Mon, 1 Dec 2008 01:18:09 +0000 (GMT)
Hugh Dickins <hugh <at> veritas.com> wrote:

> On Fri, 28 Nov 2008, Nick Piggin wrote:
> > On Thu, Nov 27, 2008 at 06:14:18PM +0000, Hugh Dickins wrote:
> > > On Thu, 27 Nov 2008, Nick Piggin wrote:
> > > > On Thu, Nov 27, 2008 at 11:52:40AM +0200, Pekka Enberg wrote:
> > > > > > -               err = add_to_page_cache_lru(page, mapping, index, gfp_mask);
> > > > > > +               err = add_to_page_cache_lru(page, mapping, index,
> > > > > > +                       (gfp_mask & (__GFP_FS|__GFP_IO|__GFP_WAIT|__GFP_HIGH)));
> > > > > 
> > > > > Can we use GFP_RECLAIM_MASK here? I mean, surely we need to pass
> > > > > __GFP_NOFAIL, for example, down to radix_tree_preload() et al?
> > > 
> > > I certainly agree with Pekka's suggestion to use GFP_RECLAIM_MASK.
> > > 
> > > > 
> > > > Updated patch.
> > > 
> > > I'm not sure about it.  I came here before 2.6.25, not yet got around
> > > to submitting, I went in the opposite direction.  What drove me was an
> > > irritation at the growing number of & ~__GFP_HIGHMEMs (after adding a
> > > couple myself in shmem.c).  At the least, I think we ought to change
> > > those to & GFP_RECLAIM_MASKs (it seems we don't have one, but can
> > > imagine a block driver that wants GFP_DMA or GFP_DMA32 for pagecache:
> > > there's no reason to alloc its kernel-internal data structures for DMA).
> > > 
> > > My patch went the opposite direction to yours, in that I was pushing
> > > down the GFP_RECLAIM_MASKing into lib/radix-tree.c and mm/memcontrol.c
> > > (but that now doesn't kmalloc for itself, so no longer needs the mask).
(Continue reading)

hooanon05 | 1 Dec 04:57 2008
Picon

Re: [rfc git patch] union directory


Hello Miklos,

Miklos Szeredi:
> The plan is to get a simple kernel implementation first which caches
> the directory in 'struct file'.

If you have not looked at the source code of AUFS, try it out.
It caches the directory entries in 'struct file.'

approach: http://marc.info/?l=linux-kernel&m=118915086030712&w=2
dir.[ch]: http://marc.info/?l=linux-kernel&m=121134143206375&w=2
vdir.c: http://marc.info/?l=linux-kernel&m=121134246707486&w=2

J. R. Okajima
Bharata B Rao | 1 Dec 05:35 2008
Picon

Re: [rfc git patch] union directory

On Sat, Nov 29, 2008 at 05:33:19PM +0100, Miklos Szeredi wrote:
> On Fri, 28 Nov 2008, Bharata B Rao wrote:
> > On Fri, Nov 28, 2008 at 12:01:32PM +0100, Miklos Szeredi wrote:
> > > I've been doing some small fixing/cleanup work on the union directory
> > > patches by Jan, and just noticed there's a thread about the union
> > > mounts on LKML, so I thought publicizing won't hurt.
> > 
> > Interesting that you call it "union directory", do you have plans to go
> > the Plan 9's way of union directories ?
> 
> I think yes, although I never tried Plan9 and don't know the details
> of the union directory semantics.
> 
> At first we thought of providing completely read-only unioning (no
> whiteouts, no object creation/removal).  This gets rid of a _lot_ of
> complexity.
> 
> > > It's still a work in progress, notably the readdir code currently only
> > > works on a few specific filesystem types.
> > 
> > readdir was one of the things on which we couldn't reach a consensus
> > on how to do it the right way. We were suggested that we move the
> > duplicate elimination into user space and an effort towards this was
> > also done (Ref: http://lkml.org/lkml/2008/4/29/248) by moving this
> > to glibc readdir. But we weren't sure how this could work for NFS and
> > were told that it is required to get the NFS side of things
> > sorted out first. So that's where readdir effort stands now afaik.
> > Do you have any ideas/plans on this front ?
> 
> The plan is to get a simple kernel implementation first which caches
(Continue reading)

KAMEZAWA Hiroyuki | 1 Dec 08:39 2008

Re: [patch 1/2] mm: pagecache allocation gfp fixes

On Mon, 1 Dec 2008 10:50:38 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com> wrote:

> On Mon, 1 Dec 2008 01:18:09 +0000 (GMT)
> Hugh Dickins <hugh <at> veritas.com> wrote:
>
> It comes from the fact that memcg reclaims memory not because of memory shortage
> but of memory limit.
> "From which zone the memory should be reclaimed" is not problem. 
> I used GFP_HIGHUSER_MOVABLE to show "reclaim from anyware" in explicit way.
> too bad ?
> 
Maybe I got your point..

Hmm...but...

mmotm-Nov29's following gfp_mask is buggy (mis leading).
==
int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
                pgoff_t offset, gfp_t gfp_mask)
{
        int error;

        error = mem_cgroup_cache_charge(page, current->mm,
                                        gfp_mask & ~__GFP_HIGHMEM);
        if (error)
==

mem_cgroup_cache_charge() has to reclaim memory from HIGHMEM (if used.) 
to make room. (not to reclaim memory from some specified area.)
(Continue reading)

Bharata B Rao | 1 Dec 08:59 2008
Picon

Re: [rfc git patch] union directory

On Mon, Dec 01, 2008 at 10:05:09AM +0530, Bharata B Rao wrote:
> On Sat, Nov 29, 2008 at 05:33:19PM +0100, Miklos Szeredi wrote:
> > On Fri, 28 Nov 2008, Bharata B Rao wrote:
> > 
> > The plan is to get a simple kernel implementation first which caches
> > the directory in 'struct file'.
> 
> FYI, I did this for a version of Union Mount.
> (http://lkml.org/lkml/2007/6/20/21) I was maintaining the readdir cache in
> struct file and the cache was persistant across readdir calls.
> 
> Do you have anything different in mind ?

The version pointed to by me above had the entire cache stored
in the struct file of the topmost directory. I guess your plan
is to cache them per branch and combine them during
readdir()/getdents() ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nick Piggin | 1 Dec 09:33 2008
Picon

[patch][rfc] fs: shrink struct dentry

Hi,
Comments?
Thanks,
Nick

--
struct dentry is one of the most critical structures in the kernel. So it's
sad to see it going neglected.

With CONFIG_PROFILING turned on (which is probably the common case at least
for distros and kernel developers), sizeof(struct dcache) == 208 here
(64-bit). This gives 19 objects per slab.

I packed d_mounted into a hole, and took another 4 bytes off the inline
name length to take the padding out from the end of the structure. This
shinks it to 200 bytes. I could have gone the other way and increased the
length to 40, but I'm aiming for a magic number, read on...

I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
why was this ever a good idea? The cookie system should increase its hash
size or use a tree or something if lookups are a problem. Also the "fast
dcookie lookups" in oprofile should be moved into the dcookie code -- how
can oprofile possibly care about the dcookie_mutex? It gets dropped after
get_dcookie() returns so it can't be providing any sort of protection.

At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
will now require 3 cachelines to touch all fields wheras previously it
would require 4.
(Continue reading)

Nick Piggin | 1 Dec 09:44 2008
Picon

Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree

On Sun, Nov 30, 2008 at 05:17:34PM -0500, Jeff Layton wrote:
> On Sun, 30 Nov 2008 15:44:21 -0600
> "Steve French" <smfrench <at> gmail.com> wrote:
> 
> > On Fri, Nov 28, 2008 at 6:18 AM, Jeff Layton <jlayton <at> redhat.com> wrote:
> > >> One minor thing -- you could do the !PageUptodate check first? If the
> > >> page is already uptodate, then everything is much simpler I think? (and
> > >> PageChecked should not be set).
> > >>
> > >> if (!PageUptodate(page)) {
> > >>     if (PageChecked(page)) {
> > >>         if (copied == len)
> > >>             SetPageUptodate(page);
> > >>         ClearPageChecked(page);
> > >>     } else if (copied == PAGE_CACHE_SIZE)
> > >>         SetPageUptodate(page);
> > >> }
> > >>
> > >> I don't know if you think that's better or not, but I really like to
> > >> make it clear that this is the !PageUptodate logic, and we never try
> > >> to SetPageUptodate on an already uptodate page.
> > >>
> > >> But I guess it is just a matter of style. So go with whatever you like
> > >> best.
> > > --------------[snip]---------------
> > > Subject: [PATCH] cifs: clean up conditionals in cifs_write_end
> > >
> > > Make it clear that the conditionals at the start of cifs_write_end are
> > > just for the situation when the page is not uptodate.
> > >
(Continue reading)

Christian Borntraeger | 1 Dec 09:57 2008
Picon

Re: [PATCH v2]: check for fops->owner in anon_inode_getfd

Am Donnerstag, 27. November 2008 schrieb Davide Libenzi:
> > ===================================================================
> > --- kvm.orig/fs/anon_inodes.c
> > +++ kvm/fs/anon_inodes.c
> >  <at>  <at>  -79,9 +79,12  <at>  <at>  int anon_inode_getfd(const char *name, c
> >  	if (IS_ERR(anon_inode_inode))
> >  		return -ENODEV;
> >  
> > +	if (fops->owner && !try_module_get(fops->owner))
> > +		return -ENOENT;
> > +
> >  	error = get_unused_fd_flags(flags);
> >  	if (error < 0)
> > -		return error;
> > +		goto err_module;
> >  	fd = error;
> >  
> >  	/*
> >  <at>  <at>  -128,6 +131,8  <at>  <at>  err_dput:
> >  	dput(dentry);
> >  err_put_unused_fd:
> >  	put_unused_fd(fd);
> > +err_module:
> > +	module_put(fops->owner);
> >  	return error;
> >  }
> >  EXPORT_SYMBOL_GPL(anon_inode_getfd);
> 
> Looks OK to me.

(Continue reading)

Warren Turkal | 1 Dec 11:28 2008

Re: [PATCH 2/2] Fix journal detection on HFS+.

I have a drive that has the property of having the journal bit set and
having 0 in the journal_info_block field that mounts in rw on MacOS X.
That's how I discovered the problem.

wt

On Thu, Nov 27, 2008 at 8:21 AM, Roman Zippel <zippel <at> linux-m68k.org> wrote:
> Hi,
>
> On Tue, 25 Nov 2008, Warren Turkal wrote:
>
>> On Sat, Nov 22, 2008 at 8:52 PM, Roman Zippel <zippel <at> linux-m68k.org> wrote:
>> > I'm curious how common it is to have the journal bit set but no journal block,
>> > I haven't seen this case so far.
>>
>> It's so uncommon that the technote for HFS+ doesn't mention it.
>> However, I did find [1], and it's (c) by Apple. Look at the comment at
>> [2] to see what tipped me off.
>
> That likely also means the journal bit is cleared. If you want to verify
> this, you should check the actual kernel source and there I can't find
> anywhere, that they handle a zero journal block specially, thus it will
> result in a failure to replay the journal, so I don't see why we should
> allow write access to the volume in this case.
>
>> > IMO more useful would be to read the journal block and check if there is
>> > anything that needs to be replayed.
>> > If you're interested in a second step you could replay the journal, it's not
>> > that difficult to do, it's pretty much just copying blocks around.
>>
(Continue reading)


Gmane