Daniel Phillips | 10 Nov 11:20

Deferred Namespace Operations

Cache layering is a central idea in the Tux3 atomic update model.  The 
cache "front end" consists of data blocks in inode page cache, inode 
attributes in inode cache and file names in dentry cache.  The 
cache "back end" consists of cached file index blocks, inode table 
blocks, and inode table index blocks.  Applications directly modify the 
front end cache via syscalls and memory maps, while the back end cache 
is modified only by the filesystem during the process of encoding 
changes in cache permanently to disk.

The great promise of such a layering is to allow "bumpless" operation.  
So long as sufficient cache memory is available and any needed metadata 
blocks have been read into cache, the front end does not need to wait 
for the back end to complete its work.  It just returns immediately to 
its caller after updating a few VFS cache objects, without needing to 
locate and update cached disk blocks as well.

In broad outline, the concept is simple, clean and compelling.  In 
practice, there are issues to overcome.  First, some background.

Changes made to front end cache are batched into "deltas"[1], where each 
delta comprises all the changes required to represent some set of file 
operations carried out by the front end, or equivalently, the changes 
required to make the filesystem state of the previous delta represent 
the cache state as of the new delta.

Each new delta goes through a "setup" step that selects and assigns disk 
addresses for updated data blocks, modifies cached index blocks 
accordingly, and creates log blocks to specify index block changes 
logically.  Following setup, the block images of a delta are 
transferred to disk.  Finally a delta commit block is written to 
(Continue reading)

Maciej Żenczykowski | 10 Nov 22:56
Picon
Gravatar

Re: Deferred Namespace Operations

> two.  A file that is created and deleted before a delta transition
> takes place will not only never appear on disk, it will not even appear
> in a cached disk block.

I'd like to point out, that if you create a file, open it, then delete
it, you can then still use it to store temporary data - this is indeed
a common use case.  However the amount of data storage may very well
exceed what you would be willing to store in ram, and thus you would
want to be able to write this data out to disk, even though the file
itself doesn't exist any more...  Some sort of swap-like behaviour???

Maciej
OGAWA Hirofumi | 11 Nov 06:34
Picon
Gravatar

[PATCH 1/3] Use tuxtime() to update timestamp


# HG changeset patch
# User OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
# Date 1226380667 -32400
# Node ID 24f3b0f20899b2c5ad5554b7512f647f75c5d3be
# Parent  dab895e2e896189f1764a7ec3330b6473757138e
Use tuxtime() to update timestamp

And this moves time functions to tux3.h to use those without including
inode.c

diff -r dab895e2e896 -r 24f3b0f20899 user/dir.c
--- a/user/dir.c	Mon Nov 10 20:15:37 2008 -0800
+++ b/user/dir.c	Tue Nov 11 14:17:47 2008 +0900
@@ -194,7 +194,11 @@
 	memcpy(entry->name, name, len);
 	entry->inum = cpu_to_le32(inum);
 	entry->type = ext2_type_by_mode[(mode & S_IFMT) >> STAT_SHIFT];
+#ifdef main
+	dir->i_mtime = dir->i_ctime = tuxtime();
+#else
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+#endif
 	mark_inode_dirty(dir);
 	offset = (void *)entry - buffer->data;
 	brelse_dirty(buffer);
diff -r dab895e2e896 -r 24f3b0f20899 user/inode.c
--- a/user/inode.c	Mon Nov 10 20:15:37 2008 -0800
+++ b/user/inode.c	Tue Nov 11 14:17:47 2008 +0900
@@ -16,26 +16,6 @@
(Continue reading)

OGAWA Hirofumi | 11 Nov 06:35
Picon
Gravatar

[PATCH 2/3] Convert more timestamp to high resolution in tux3fuse


# HG changeset patch
# User OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
# Date 1226380733 -32400
# Node ID 55013c67b722e7b218c348f7cda68c766c62fd31
# Parent  24f3b0f20899b2c5ad5554b7512f647f75c5d3be
Convert more timestamp to high resolution in tux3fuse

diff -r 24f3b0f20899 -r 55013c67b722 user/tux3fuse.c
--- a/user/tux3fuse.c	Tue Nov 11 14:17:47 2008 +0900
+++ b/user/tux3fuse.c	Tue Nov 11 14:18:53 2008 +0900
@@ -84,9 +84,24 @@
 			.attr = {
 				.st_ino   = inode->inum,
 				.st_mode  = inode->i_mode,
+#if 1
+				.st_atim  = {
+					.tv_sec  = high32(inode->i_atime),
+					.tv_nsec = millionths(inode->i_atime) * 1000,
+				},
+				.st_mtim  = {
+					.tv_sec  = high32(inode->i_mtime),
+					.tv_nsec = millionths(inode->i_mtime) * 1000,
+				},
+				.st_ctim  = {
+					.tv_sec  = high32(inode->i_ctime),
+					.tv_nsec = millionths(inode->i_ctime) * 1000,
+				},
+#else
 				.st_atime = high32(inode->i_atime),
(Continue reading)

OGAWA Hirofumi | 11 Nov 06:35
Picon
Gravatar

[PATCH 3/3] Add billionths() and use it, instead of millionths()


# HG changeset patch
# User OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
# Date 1226380749 -32400
# Node ID daac4717e4cc70df5efde5bf82df1841dcc99537
# Parent  55013c67b722e7b218c348f7cda68c766c62fd31
Add billionths() and use it, instead of millionths()

diff -r 55013c67b722 -r daac4717e4cc user/tux3.h
--- a/user/tux3.h	Tue Nov 11 14:18:53 2008 +0900
+++ b/user/tux3.h	Tue Nov 11 14:19:09 2008 +0900
@@ -254,9 +254,9 @@
 	return tuxtimeval(now.tv_sec, now.tv_usec);
 }

-static inline unsigned millionths(fixed32 val)
+static inline unsigned billionths(fixed32 val)
 {
-	return (((val & 0xffffffff) * 1000000) + 0x80000000) >> 32;
+	return ((((val & 0xffffffff) * 1000000) + 0x80000000) >> 32) * 1000;
 }

 static inline u32 high32(fixed32 val)
diff -r 55013c67b722 -r daac4717e4cc user/tux3fuse.c
--- a/user/tux3fuse.c	Tue Nov 11 14:18:53 2008 +0900
+++ b/user/tux3fuse.c	Tue Nov 11 14:19:09 2008 +0900
@@ -87,15 +87,15 @@
 #if 1
 				.st_atim  = {
 					.tv_sec  = high32(inode->i_atime),
(Continue reading)

Daniel Phillips | 11 Nov 08:15

Re: Deferred Namespace Operations

On Monday 10 November 2008 13:56, Maciej Żenczykowski wrote:
> > two.  A file that is created and deleted before a delta transition
> > takes place will not only never appear on disk, it will not even appear
> > in a cached disk block.
> 
> I'd like to point out, that if you create a file, open it, then delete
> it, you can then still use it to store temporary data - this is indeed
> a common use case.  However the amount of data storage may very well
> exceed what you would be willing to store in ram, and thus you would
> want to be able to write this data out to disk, even though the file
> itself doesn't exist any more...  Some sort of swap-like behaviour???

You mean an orphan temporary file?  I think we just need to make sure
that works as it is supposed to.  It is reasonable for file data of
such a file to be transferred to disk just like any other file, even
though the file is unlinked.  We just need to be sure that it will get
cleaned up like any other orphan.

Daniel

_______________________________________________
Tux3 mailing list
Tux3 <at> tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
OGAWA Hirofumi | 11 Nov 08:17
Picon
Gravatar

[PATCH 1/3] Use tuxtime() to update timestamp


# HG changeset patch
# User OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
# Date 1226387681 -32400
# Node ID 6b615de4d62bea052bb57474e754233f911edf1e
# Parent  dab895e2e896189f1764a7ec3330b6473757138e
Use tuxtime() to update timestamp

And this moves time functions to tux3.h to use those without including
inode.c

diff -r dab895e2e896 -r 6b615de4d62b user/dir.c
--- a/user/dir.c	Mon Nov 10 20:15:37 2008 -0800
+++ b/user/dir.c	Tue Nov 11 16:14:41 2008 +0900
@@ -194,7 +194,7 @@
 	memcpy(entry->name, name, len);
 	entry->inum = cpu_to_le32(inum);
 	entry->type = ext2_type_by_mode[(mode & S_IFMT) >> STAT_SHIFT];
-	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+	dir->i_mtime = dir->i_ctime = tuxtime();
 	mark_inode_dirty(dir);
 	offset = (void *)entry - buffer->data;
 	brelse_dirty(buffer);
diff -r dab895e2e896 -r 6b615de4d62b user/inode.c
--- a/user/inode.c	Mon Nov 10 20:15:37 2008 -0800
+++ b/user/inode.c	Tue Nov 11 16:14:41 2008 +0900
@@ -16,26 +16,6 @@
 #define filemap_included
 #include "filemap.c"
 #undef main
(Continue reading)

OGAWA Hirofumi | 11 Nov 08:19
Picon
Gravatar

[PATCH 2/3] Convert more timestamp to high resolution in tux3fuse


# HG changeset patch
# User OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
# Date 1226387746 -32400
# Node ID d680e7bc61353f6c4de5ef24e30b4f5d8245150d
# Parent  6b615de4d62bea052bb57474e754233f911edf1e
Convert more timestamp to high resolution in tux3fuse

diff -r 6b615de4d62b -r d680e7bc6135 user/tux3fuse.c
--- a/user/tux3fuse.c	Tue Nov 11 16:14:41 2008 +0900
+++ b/user/tux3fuse.c	Tue Nov 11 16:15:46 2008 +0900
@@ -84,9 +84,24 @@
 			.attr = {
 				.st_ino   = inode->inum,
 				.st_mode  = inode->i_mode,
+#if 1
+				.st_atim  = {
+					.tv_sec  = high32(inode->i_atime),
+					.tv_nsec = millionths(inode->i_atime) * 1000,
+				},
+				.st_mtim  = {
+					.tv_sec  = high32(inode->i_mtime),
+					.tv_nsec = millionths(inode->i_mtime) * 1000,
+				},
+				.st_ctim  = {
+					.tv_sec  = high32(inode->i_ctime),
+					.tv_nsec = millionths(inode->i_ctime) * 1000,
+				},
+#else
 				.st_atime = high32(inode->i_atime),
(Continue reading)

OGAWA Hirofumi | 11 Nov 08:19
Picon
Gravatar

[PATCH 3/3] Add billionths() and use it, instead of millionths()


# HG changeset patch
# User OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
# Date 1226387771 -32400
# Node ID e50b06dfcea7f1d56578e12e91d50e5de47983cc
# Parent  d680e7bc61353f6c4de5ef24e30b4f5d8245150d
Add billionths() and use it, instead of millionths()

diff -r d680e7bc6135 -r e50b06dfcea7 user/tux3.h
--- a/user/tux3.h	Tue Nov 11 16:15:46 2008 +0900
+++ b/user/tux3.h	Tue Nov 11 16:16:11 2008 +0900
@@ -254,9 +254,9 @@
 	return tuxtimeval(now.tv_sec, now.tv_usec);
 }

-static inline unsigned millionths(fixed32 val)
+static inline unsigned billionths(fixed32 val)
 {
-	return (((val & 0xffffffff) * 1000000) + 0x80000000) >> 32;
+	return ((((val & 0xffffffff) * 1000000) + 0x80000000) >> 32) * 1000;
 }

 static inline u32 high32(fixed32 val)
diff -r d680e7bc6135 -r e50b06dfcea7 user/tux3fuse.c
--- a/user/tux3fuse.c	Tue Nov 11 16:15:46 2008 +0900
+++ b/user/tux3fuse.c	Tue Nov 11 16:16:11 2008 +0900
@@ -87,15 +87,15 @@
 #if 1
 				.st_atim  = {
 					.tv_sec  = high32(inode->i_atime),
(Continue reading)

Maciej Żenczykowski | 11 Nov 08:50
Picon
Gravatar

Re: Deferred Namespace Operations

I understood that one of the benefits of deferred creation, was that a
later deletion could possibly end up with no disk i/o.  I was just
pointing out, that this is still not quite the case, since we need to
have enough data to later release the files data blocks... although I
guess a deleted files data-blocks could be allocated while only
marking their 'use' in in-memory-state (never writing it to disk).
However, this seems highly error-prone and not worth it.  As such the
above optimization can only really be done if we're deleting a file to
which there are no more open references...

On Mon, Nov 10, 2008 at 23:15, Daniel Phillips <phillips <at> phunq.net> wrote:
> On Monday 10 November 2008 13:56, Maciej Żenczykowski wrote:
>> > two.  A file that is created and deleted before a delta transition
>> > takes place will not only never appear on disk, it will not even appear
>> > in a cached disk block.
>>
>> I'd like to point out, that if you create a file, open it, then delete
>> it, you can then still use it to store temporary data - this is indeed
>> a common use case.  However the amount of data storage may very well
>> exceed what you would be willing to store in ram, and thus you would
>> want to be able to write this data out to disk, even though the file
>> itself doesn't exist any more...  Some sort of swap-like behaviour???
>
> You mean an orphan temporary file?  I think we just need to make sure
> that works as it is supposed to.  It is reasonable for file data of
> such a file to be transferred to disk just like any other file, even
> though the file is unlinked.  We just need to be sure that it will get
> cleaned up like any other orphan.
>
> Daniel
(Continue reading)


Gmane