Bernd Schubert | 1 Mar 2009 01:38
Favicon
Gravatar

Re: [fuse-devel] delta filesystem prototype

On Saturday 28 February 2009, Goswin von Brederlow wrote:
> Miklos Szeredi <miklos <at> szeredi.hu> writes:
> > Here is my first try at a "delta" filesystem.  It takes two
> > directories, one of which is a read-only base, and the other is where
> > the differences are stored.  It stores data, metadata and directory
> > modifications without copying up whole files from the read-only
> > branch.
> >
> > The layout of the delta store may look similar to the writable branch
> > of a union fs, but this is basically just coincidence (it was easier
> > to start out this way).
> >
> > Currently it's implemented with fuse and it's not optimized at all, so
> > performance may suck in some cases.  But I think this is a useful
> > concept and a better model, than trying to fit writable branches into
> > a union filesystem.
> >
> > Comments, bug reports are welcome.
> >
> > Thanks,
> > Miklos
>
> Wouldn't it make more sense to start with unionfs-fuse and add a delta
> feature to it? unionfs-fuse already has all you need except that it
> will copy the whole file (if on a read-only branch) on write.

Well yes, but it would need to be configurable by the user. IMMHO, the 'delta' 
ansatz has a big problem - what happens if the admin decides to modify the 
underlying ro-branch, which is a distribution chroot seen by all clients as 
their '/'? Any time files may be modified or even deleted on this branch when 
(Continue reading)

Nick Piggin | 1 Mar 2009 03:38
Picon

Re: [patch][rfc] mm: new address space calls

On Sat, Feb 28, 2009 at 06:19:56PM -0500, Christoph Hellwig wrote:
> On Wed, Feb 25, 2009 at 03:59:57PM -0500, Chris Mason wrote:
> > One problem I have with the btrfs extent state code is that I might
> > choose to release the extent state in releasepage, but the VM might not
> > choose to free the page.  So I've got an up to date page without any of
> > the rest of my state.
> > 
> > Which of these ops covers that? ;)  I'd love to help better document the
> > requirements for these callbacks, I find it confusing every time.
> 
> releasepage has also another problem.  It only gets called after
> discard_buffer discarded lots of valuable information from the buffers,
> which gets XFS into really bad trouble as that drops information if
> there is a delalloc extent.

Then I think it just needs to provide its own invalidatepage?

> I'd really like to see some major overhaul in that area, and that also
> extende to documentation (or just naming, why is block_invalidatepage
> calling into a method called ->releasepage, but there also is a
> ->invalidatepage that gets called from truncate*page routines..)

Those convoluted call paths are really bloody annoying.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nick Piggin | 1 Mar 2009 03:45
Picon

Re: [patch][rfc] mm: new address space calls

On Sat, Feb 28, 2009 at 06:24:21PM -0500, Christoph Hellwig wrote:
> On Wed, Feb 25, 2009 at 11:48:39AM +0100, Nick Piggin wrote:
> > This is about the last change to generic code I need for fsblock.
> > Comments?
> > 
> > Introduce new address space operations sync and release, which can be used
> > by a filesystem to synchronize and release per-address_space private metadata.
> > They generalise sync_mapping_buffers, invalidate_inode_buffers, and
> > remove_inode_buffers calls, and get another step closer to divorcing
> > buffer heads from core mm/fs code.
> 
> >  void invalidate_inode_buffers(struct inode *inode)
> >  {
> > -	if (inode_has_buffers(inode)) {
> > -		struct address_space *mapping = &inode->i_data;
> > +	struct address_space *mapping = &inode->i_data;
> > +
> > +	if (mapping_has_private(mapping)) {
> >  		struct list_head *list = &mapping->private_list;
> >  		struct address_space *buffer_mapping = mapping->assoc_mapping;
> 
> I'ts not really helping much here as we still directly poke into the
> buffer_head list.

This is in fs/buffer.c.

Or do you object to the definition of mapping_has_private? Yes that
still checks the private_list, but it would be trivial to convert it
over to checking a bit in the mapping now. I just didn't do it because
fsblock also uses the private_list.
(Continue reading)

Nick Piggin | 1 Mar 2009 03:50
Picon

Re: [rfc][patch 2/5] fsblock: fsblock proper

On Sat, Feb 28, 2009 at 12:40:32PM +0100, Nick Piggin wrote:
> This is the core fsblock code. It also touches a few other little things which
> I should break out, but can basically be ignored.
> 
> Non-fsblock changes:
> fs-writeback.c, page-writeback.c, backing-dev.h: minor changes to support my
> bdflush flusher experiment (flushing data and metadata together based on bdev
> rather than pdflush looping over inodes etc, but this is disabled by default
> unless you uncomment BDFLUSH_FLUSHING in fsblock_types.h).
>  
> main.c: fsblock_init();
> 
> sysctl.c: sysctl disable fsblock freeing on 0 refcount. Just helps comparison.
> 
> truncate.c: should effectively be a noop... some leftover stuff to fix
>             superpage block truncation but it isn't quite finished.
> 
> page-flags.h: PageBlocks alias for PagePrivate, and some debugging stuff.

This seems to have been eaten by vger, so I'll attach a gzip.

Attachment (fsblock.patch.gz): application/x-gzip, 29 KiB
Dave Chinner | 1 Mar 2009 09:17

Re: [patch][rfc] mm: hold page lock over page_mkwrite

On Wed, Feb 25, 2009 at 10:36:29AM +0100, Nick Piggin wrote:
> I need this in fsblock because I am working to ensure filesystem metadata
> can be correctly allocated and refcounted. This means that page cleaning
> should not require memory allocation (to be really robust).

Which, unfortunately, is just a dream for any filesystem that uses
delayed allocation. i.e. they have to walk the free space trees
which may need to be read from disk and therefore require memory
to succeed....

Cheers,

Dave.
--

-- 
Dave Chinner
david <at> fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Goswin von Brederlow | 1 Mar 2009 11:17
Picon

Re: [fuse-devel] delta filesystem prototype

Bernd Schubert <bs_lists <at> aakef.fastmail.fm> writes:

> On Saturday 28 February 2009, Goswin von Brederlow wrote:
>> Miklos Szeredi <miklos <at> szeredi.hu> writes:
>> > Here is my first try at a "delta" filesystem.  It takes two
>> > directories, one of which is a read-only base, and the other is where
>> > the differences are stored.  It stores data, metadata and directory
>> > modifications without copying up whole files from the read-only
>> > branch.
>> >
>> > The layout of the delta store may look similar to the writable branch
>> > of a union fs, but this is basically just coincidence (it was easier
>> > to start out this way).
>> >
>> > Currently it's implemented with fuse and it's not optimized at all, so
>> > performance may suck in some cases.  But I think this is a useful
>> > concept and a better model, than trying to fit writable branches into
>> > a union filesystem.
>> >
>> > Comments, bug reports are welcome.
>> >
>> > Thanks,
>> > Miklos
>>
>> Wouldn't it make more sense to start with unionfs-fuse and add a delta
>> feature to it? unionfs-fuse already has all you need except that it
>> will copy the whole file (if on a read-only branch) on write.
>
> Well yes, but it would need to be configurable by the user. IMMHO, the 'delta' 
> ansatz has a big problem - what happens if the admin decides to modify the 
(Continue reading)

Boaz Harrosh | 1 Mar 2009 11:43
Favicon
Gravatar

Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils

FUJITA Tomonori wrote:
> On Tue, 17 Feb 2009 10:10:15 +0200
> Boaz Harrosh <bharrosh <at> panasas.com> wrote:
> 
>> FUJITA Tomonori wrote:
>>> Can you stop the argument, "exofs is similar to the existing
>>> traditional file systems hence it should be treated equally". It's
>>> simply untrue. Does anyone except for panasas people insist the same
>>> argument?
>>>
>> No I will not, it is true. exofs is just a regular old filesystem
>> nothing different.
> 
> After reading this, I gave up discussing this issue with you but I
> still wait for your fixes that you promised:
> 
> http://marc.info/?l=linux-scsi&m=123445759718253&w=2
> 
> 
> Thanks,
> --

They are on the way, I have not forgotten

Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

Nick Piggin | 1 Mar 2009 14:50
Picon

Re: [patch][rfc] mm: hold page lock over page_mkwrite

On Sun, Mar 01, 2009 at 07:17:44PM +1100, Dave Chinner wrote:
> On Wed, Feb 25, 2009 at 10:36:29AM +0100, Nick Piggin wrote:
> > I need this in fsblock because I am working to ensure filesystem metadata
> > can be correctly allocated and refcounted. This means that page cleaning
> > should not require memory allocation (to be really robust).
> 
> Which, unfortunately, is just a dream for any filesystem that uses
> delayed allocation. i.e. they have to walk the free space trees
> which may need to be read from disk and therefore require memory
> to succeed....

Well it's a dream because probably none of them get it right, but
that doesn't mean its impossible.

You don't need complete memory allocation up-front to be robust,
but having reserves or degraded modes that simply guarantee
forward progress is enough.

For example, if you need to read/write filesystem metadata to find
and allocate free space, then you really only need a page to do all
the IO.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Dave Chinner | 2 Mar 2009 09:19

Re: [patch][rfc] mm: hold page lock over page_mkwrite

On Sun, Mar 01, 2009 at 02:50:57PM +0100, Nick Piggin wrote:
> On Sun, Mar 01, 2009 at 07:17:44PM +1100, Dave Chinner wrote:
> > On Wed, Feb 25, 2009 at 10:36:29AM +0100, Nick Piggin wrote:
> > > I need this in fsblock because I am working to ensure filesystem metadata
> > > can be correctly allocated and refcounted. This means that page cleaning
> > > should not require memory allocation (to be really robust).
> > 
> > Which, unfortunately, is just a dream for any filesystem that uses
> > delayed allocation. i.e. they have to walk the free space trees
> > which may need to be read from disk and therefore require memory
> > to succeed....
> 
> Well it's a dream because probably none of them get it right, but
> that doesn't mean its impossible.
> 
> You don't need complete memory allocation up-front to be robust,
> but having reserves or degraded modes that simply guarantee
> forward progress is enough.
> 
> For example, if you need to read/write filesystem metadata to find
> and allocate free space, then you really only need a page to do all
> the IO.

For journalling filesystems, dirty metadata is pinned for at least the
duration of the transaction and in many cases it is pinned for
multiple transactions (i.e. in memory aggregation of commits like
XFS does). And then once the transaction is complete, it can't be
reused until it is written to disk.

For the worst case usage in XFS, think about a complete btree split
(Continue reading)

Nick Piggin | 2 Mar 2009 09:37
Picon

Re: [patch][rfc] mm: hold page lock over page_mkwrite

On Mon, Mar 02, 2009 at 07:19:53PM +1100, Dave Chinner wrote:
> On Sun, Mar 01, 2009 at 02:50:57PM +0100, Nick Piggin wrote:
> > On Sun, Mar 01, 2009 at 07:17:44PM +1100, Dave Chinner wrote:
> > > On Wed, Feb 25, 2009 at 10:36:29AM +0100, Nick Piggin wrote:
> > > > I need this in fsblock because I am working to ensure filesystem metadata
> > > > can be correctly allocated and refcounted. This means that page cleaning
> > > > should not require memory allocation (to be really robust).
> > > 
> > > Which, unfortunately, is just a dream for any filesystem that uses
> > > delayed allocation. i.e. they have to walk the free space trees
> > > which may need to be read from disk and therefore require memory
> > > to succeed....
> > 
> > Well it's a dream because probably none of them get it right, but
> > that doesn't mean its impossible.
> > 
> > You don't need complete memory allocation up-front to be robust,
> > but having reserves or degraded modes that simply guarantee
> > forward progress is enough.
> > 
> > For example, if you need to read/write filesystem metadata to find
> > and allocate free space, then you really only need a page to do all
> > the IO.
> 
> For journalling filesystems, dirty metadata is pinned for at least the
> duration of the transaction and in many cases it is pinned for
> multiple transactions (i.e. in memory aggregation of commits like
> XFS does). And then once the transaction is complete, it can't be
> reused until it is written to disk.
> 
(Continue reading)


Gmane