Chris Mason | 1 Nov 2004 16:32

[PATCH RFC] O_DIRECT reads and writes without i_sem

Hello everyone,

Right now, O_DIRECT reads and writes on regular files have to take i_sem
while reading file metadata in order to make sure we don't race with
hole filling.

This patch tries to get around that by avoiding i_sem when we are doing
an O_DIRECT read or write inside of i_size.  Yet another rw semaphore is
added to struct inode to protect against holes being filled during the
O_DIRECT.  direct-io.c gets another special case to be aware of the
locking.

This has only been lightly tested, I'm posting here for general comments
before I go much further.  I'm rounding up some hardware with enough
disks to benchmark it properly.  

Only ext2 and reiserfs are modified to drop i_sem during O_DIRECT. ext3
needs some care around the orphan lists, and I didn't want to get into
that until the rest of the patch was working.

-chris

[patch against 2.6.10-rc1-mm2]

Index: linux.mm/fs/direct-io.c
===================================================================
--- linux.mm.orig/fs/direct-io.c	2004-11-01 09:34:24.000000000 -0500
+++ linux.mm/fs/direct-io.c	2004-11-01 09:34:40.000000000 -0500
 <at>  <at>  -915,7 +915,7  <at>  <at>  out:
 }
(Continue reading)

Christoph Hellwig | 1 Nov 2004 17:08
Favicon

Re: [PATCH RFC] O_DIRECT reads and writes without i_sem

On Mon, Nov 01, 2004 at 10:32:07AM -0500, Chris Mason wrote:
> Hello everyone,
> 
> Right now, O_DIRECT reads and writes on regular files have to take i_sem
> while reading file metadata in order to make sure we don't race with
> hole filling.
> 
> This patch tries to get around that by avoiding i_sem when we are doing
> an O_DIRECT read or write inside of i_size.  Yet another rw semaphore is
> added to struct inode to protect against holes being filled during the
> O_DIRECT.  direct-io.c gets another special case to be aware of the
> locking.
> 
> This has only been lightly tested, I'm posting here for general comments
> before I go much further.  I'm rounding up some hardware with enough
> disks to benchmark it properly.  
> 
> Only ext2 and reiserfs are modified to drop i_sem during O_DIRECT. ext3
> needs some care around the orphan lists, and I didn't want to get into
> that until the rest of the patch was working.

This gets too complicated for it's own sake.  What about going down the
XFS route and making i_sem a r/w semaphore that's taken only shared
during read and write I/O, but exclusive while setting up write I/O
outside of i_size?  Alternatively just move the I/O locking into the
filesytem.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)

Chris Mason | 1 Nov 2004 17:56

Re: [PATCH RFC] O_DIRECT reads and writes without i_sem

On Mon, 2004-11-01 at 16:08 +0000, Christoph Hellwig wrote:
> On Mon, Nov 01, 2004 at 10:32:07AM -0500, Chris Mason wrote:

> > Right now, O_DIRECT reads and writes on regular files have to take i_sem
> > while reading file metadata in order to make sure we don't race with
> > hole filling.
[ ... ]
> This gets too complicated for it's own sake.  What about going down the
> XFS route and making i_sem a r/w semaphore that's taken only shared
> during read and write I/O, but exclusive while setting up write I/O
> outside of i_size?  Alternatively just move the I/O locking into the
> filesytem.
> 

Nod, it is too complex, this is why I posted early ;)  The only place in
the FS specific code for the locking I added to fs/direct-io.c would be
in each filesystem get_blocks call.  I went for direct-io.c because
that's where all the other locking already was.  shrug.

If we do down_read(i_rw_sem) for all cases except growing the file, then
we still have no locking for O_DIRECT while we are filling holes in the
file.  The filesystem write function could do this:

	down_read(i_rw_sem)
	read fs metadata
	if (filling hole) {
		up_read(i_rw_sem)
		down_write(i_rw_sem)
		goto retry
	}
(Continue reading)

Adam J. Richter | 2 Nov 2004 11:33

Announcing Trapfs: a small lookup trapping filesystem, like autofs and devfs

 	I am pleased to announce trapfs, a virtual file system that
allows a user level program to trap dcache misses and fill them in
before the caller returns.  In many cases, it can provide the
functionality of autofs or devfs, but is smaller, at under 3kB .text +
.data, and 591 lines of source code, including some lengthy comments.
That is one third the source code line count of autofs, and just over a
fifth of its .data+.text size.  I also subjectively believe that trapfs
will usually be simpler to configure (although I don't know that it
completely obseletes anything).  Documentations/filesystems/trapfs.txt
shows several examples applications of trapfs using shell scripts or
very small programs.

	I also have a trapfs-based devfs which I am running now and
cleaning up for release in the next few days.  Trapfs can also be used
to provide create-on-demand device file functionality for some
non-devfs systems.

	Some of you may recall that almost two years ago I posted a
devfs reimplementation based on ramfs that was less than a quarter of
the size of the original devfs.  Trapfs is derived from that.
( http://marc.theaimsgroup.com/?l=linux-kernel&m=104138806530375&w=2 )

	My previous devfs code shrink was generally well received, but
not integrated due to "stable kernel" issues and my not pushing it at
the time.  This time, I would like to get trapfs and trapfs-based
devfs into the stock kernel pretty promptly.  So, please take a good
look at it and tell me what you think.

	If only trivially-fixed problems are identified, then I hope
to run and regenerate the patch against -bk11, fix any problems that
(Continue reading)

Jamie Lokier | 1 Nov 2004 22:43

Re: Announcing Trapfs: a small lookup trapping filesystem, like autofs and devfs

Adam J. Richter wrote:
> Trapfs can also be used to provide create-on-demand device file
> functionality for some non-devfs systems.

I understand udev is the "new" way to create device files when devices
are attached.

It works fine for real devices.  However, udev's weak point is that it
won't create devices like "/dev/ppp" and "/dev/net/tun0" until you
load their respective modules...  and you usually want an attempt to
load those devices to cause the modules to be loaded.

That's why on my laptop, when I want to run pppd, I have to do
"modprobe ppp_async" first.

There's an ugly workaround where the entire contents of /dev are
stored in a .tar.bz2 file which is restored at boot, including those
kinds of device nodes, but it is very ugly because that file
invariably ends up containing a lot of devices that you don't want,
and duplicating a lot of the settings in udev's config files.  When
using that .tar.bz2, there isn't a lot of point in using udev at all.

It would be nice if opening a non-existent file in /dev would trigger
a hotplug/udev event - but otherwise have a perfectly normal
tmpfs-like filesystem.  IMHO that would fix udev nicely.

Is trapfs suitable for that?

-- Jamie
-
(Continue reading)

Greg KH | 1 Nov 2004 23:04
Gravatar

Re: Announcing Trapfs: a small lookup trapping filesystem, like autofs and devfs

On Mon, Nov 01, 2004 at 09:43:53PM +0000, Jamie Lokier wrote:
> 
> It would be nice if opening a non-existent file in /dev would trigger
> a hotplug/udev event - but otherwise have a perfectly normal
> tmpfs-like filesystem.  IMHO that would fix udev nicely.

I've been considering creating a fs based on tmpfs that does just that
for udev, if only to keep issues like this from coming up all the time
:)

> Is trapfs suitable for that?

trapfs looks to be based on ramfs, not tmpfs, so I don't think it would
work out as well.  But Adam might have a different idea.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bryan Henderson | 2 Nov 2004 02:50
Picon
Favicon

(unknown)

Does anyone know the current state of the maxsize= option etc. for ramfs?

3-4 years ago, there was code written to add critically missing function 
to ramfs to allow one to limit by mount option how big the filesystem 
could grow and to keep track of how much space was used.  I believe Red 
Hat distributed it at least for a while.

I don't see any such code in any of various current kernel source trees 
I've looked at.  I do see via a web search lots of people using the 
maxsize= mount option, probably blissfully unaware that the filesystem 
driver ignores all mount options.

It seems to me ramfs is too dangerous to be usable without a size limit.

--
Bryan Henderson                          IBM Almaden Research Center
San Jose CA                              Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adam J. Richter | 2 Nov 2004 07:50

Re: Announcing Trapfs: a small lookup trapping filesystem, like autofs and devfs

On Mon, 1 Nov 2004 14:04:59 -0800, Greg KH wrote:
>On Mon, Nov 01, 2004 at 09:43:53PM +0000, Jamie Lokier wrote:
>> 
>> It would be nice if opening a non-existent file in /dev would trigger
>> a hotplug/udev event - but otherwise have a perfectly normal
>> tmpfs-like filesystem.  IMHO that would fix udev nicely.

>I've been considering creating a fs based on tmpfs that does just that
>for udev, if only to keep issues like this from coming up all the time
>:)

>> Is trapfs suitable for that?

>trapfs looks to be based on ramfs, not tmpfs, so I don't think it would
>work out as well.  But Adam might have a different idea.

	If I understand correctly, there are three or four
advantages of using tmpfs:

	1. You can set the maximum number of inodes.

	2. You can set the maximum number of blocks used for plain
	   file data.

	3. The plain file data blocks can be swapped out.

	4. Extended attributes.

	For traditional devfs and autofs applications, neither of
which create data files, only limiting the number of inodes and
(Continue reading)

Adam J. Richter | 3 Nov 2004 03:47

[PATCH 2.6.10-rc1-bk11] user-level lookup handler for tmpfs

	Here is a variant of the trapfs patch that I posted
yesterday.

	For anyone who missed that posting, it provided
a way to invoke a user level helper program whenever someone
attempts to access a nonexistant file in the file system,
by which one can implement facilities similar to autofs and
devfs.  The file Documentations/filesystems/lookup-trap.txt
provides some simple useful examples.  Here is a URL for
my previous trapfs posting:
http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109933559226369&w=2

	The big change in this version is that it is implemented
as an additional option to tmpfs rather than as a new file system.
With this patch, you can trap file name misses on an tmpfs file
system by doing something like:

	mount -t tmpfs -o handler=/bin/my-autofs-program foo /mnt

	Most of the callback code is in a separate file in fs/helper.c
so that it can easily be called if any other file system wants
to use the facility, although it is only compiled into the kernel
if needed.

	In order to implement this as an option to tmpfs, I
had to change tmpfs slightly to make it allocate it
superblock private information even where there are no
allocation limits specified, although I was able to preserve
the old behavior for the case where CONFIG_TMPFS is not defined.
This is a really small amount of memory, perhaps 48 bytes, but
(Continue reading)

Adam J. Richter | 3 Nov 2004 08:43

[PATCH 2.6.10-rc1-bk11] devfs on tmpfs, deletes a lot of code (resend)

	This patch is a replacement implementation of devfs.
This patch combined a tmpfs patch that is required for certain
devfs functionality are a net deletion of more than 2000 lines
of code[1].  The code that actually remains in fs/devfs has a
.text+.data size of under 3kB.

	The implementation creates an instance of tmpfs and executes
the devfs operations that device drivers request on that instance.
You can use the "helper=" mount option patch for that I posted for
tmpfs along with the devfs_helper program[2] to handle the LOOKUP
commands in your existing /etc/devfsd.conf to have automatic loading
of kernel modules or execution of other commands in response to
attempts to access nonexistent device files.  The devfs_helper program
is invoked when needed, so there is no longer a persistent devfs daemon.

	From the point of view of device drivers, I believe the only
visible interface change is that devfs_mk_symlink has been changed
to take printf-style arguments, compatible with the other existings
devfs_mk_ calls and to take the link contents as the first argument
rather than the second.  The parameter order now is compatible with
what you pass to the "ln -s" command in the shell, and, more importantly,
allows for the printf-style arguments, which really does make the
devfs implementation slightly smaller and also allows the compiler
to eliminate more code from callers of this facility when devfs is
compiled out.  This patch includes updates to all the callers of
devfs_mk_symlink.

	I have been running variants of this devfs implementation
continously on a number of computers for almost two years, although
the code for piggybacking on another file system is only about a week
(Continue reading)


Gmane