Sasha Levin | 28 Jan 21:30 2015

[PATCH v2] vfs: read file_handle only once in handle_to_path

We used to read file_handle twice. Once to get the amount of extra bytes, and
once to fetch the entire structure.

This may be problematic since we do size verifications only after the first
read, so if the number of extra bytes changes in userspace between the first
and second calls, we'll have an incoherent view of file_handle.

Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.

Signed-off-by: Sasha Levin <sasha.levin <at>>
Change in v2:
 - Use the f_handle pointer rather than size of struct

 fs/fhandle.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 999ff5c..d59712d 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
 <at>  <at>  -195,8 +195,9  <at>  <at>  static int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
 		goto out_err;
 	/* copy the full handle */
-	if (copy_from_user(handle, ufh,
-			   sizeof(struct file_handle) +
+	*handle = f_handle;
+	if (copy_from_user(&handle->f_handle,
(Continue reading)

Christoph Hellwig | 27 Jan 18:55 2015

[RFC] split struct kiocb

This series cuts down the amount of fiels in the public iocb that is
allocated on stack for every synchronous I/O, both by removing fields
from it, and by adding a aio-specific iocb that is only allocated
for aio requests.

Additionally it cleans up various corner cases in the aio completion
code and adds a simple in-kernel async read/write interface.

Note that there still are some issues with fuse, see the first patch
for details.

To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

Daniel Drake | 27 Jan 17:22 2015

Bind mount moved on top of parent mount results in unmountable/invisible mount


The admittedly quirky mount setup mentioned in the subject line
produces a questionable result on 3.18, and a different but also
questionable result on 3.19. I'd like to understand if this is a
kernel bug, or simply something that userspace should be responsible
for managing better.

The context is an initramfs, which has mounts root partition at
/sysroot. The actual regular filesystem that we want to boot from is
in the /fs subdirectory of that mount.

This setup is achieved from the initramfs as follows:

1. /sysroot is mounted as normal. /proc/mounts has:
/dev/sda1 /sysroot ext4,relatime,data=ordered rw 0 0

2. /sysroot/fs is bind-mounted at /sysroot/fs. Now it can be treated
as its own mount point. /proc/mounts now has:
/dev/sda1 /sysroot ext4,relatime,data=ordered rw 0 0
/dev/sda1 /sysroot/fs ext4,relatime,data=ordered rw 0 0

3. /sysroot/fs is MS_MOVE'd to /sysroot
/dev/sda1 /sysroot ext4,relatime,data=ordered rw 0 0
/dev/sda1 /sysroot ext4,relatime,data=ordered rw 0 0

(This is a simplified version of what ostree does)

At this point, I can unmount /sysroot twice, and we're back to where
(Continue reading)

David Howells | 26 Jan 23:02 2015

Overlayfs, *notify() and file locking...

Having looked briefly at *notify() and file locking with an eye to doing some
changes there to provide support LSMs and procfs for overlayfs/unionmount type
things, I'm wondering how we're going to manage these two facilities.

The problem with both of these (afaict) is that they attach things to the
inode(s) to be watched.  Now, take overlayfs for an example:

Say you have a file that is pristine and on the lower layer.  You open it read
only and lock it.  Someone else then opens it for writing.  Even if there's a
mandatory lock on it, it will be copied up, and the copy will have no locks on
it.  Now, we can get round that - sort of - by duplicating, sharing or moving
the locking records between the inodes (though they may well exist on widely
different media).

This is probably manageable, provided there isn't one or more servers involved
(imagine if you've got one layer on NFS and another on CIFS, for example).
Further more, if there are leases, we have to manage those trans-copyup also.

Note that moving the lock may not be possible if the R/O file is still open
and still locked.  The R/O file still refers to the R/O copy, even after the
copy up.

The situation is slightly complicated in the case of overlayfs in that there's
a third inode - the overlay inode - around, though that's probably bypassed by
file->f_inode pointing to one of the other layers.  Note that to get proc and
LSMs working, I need to make file->f_path point to the overlay/union layer
whilst file->f_inode points to the upper/lower layer inode.

The situation is more complicated in the case of unionmount if we go there as
there *is* no top inode to hang things off until we try to write to the union
(Continue reading)

Jan Kara | 26 Jan 15:34 2015

[PATCH 00/16 v4] quota: Unify VFS and XFS quota interfaces


  this is another iteration of patches to unify VFS and XFS quota interfaces so
that XFS quotactls work for filesystems with VFS quotas and vice versa.  This
is useful so that userspace doesn't have to care that much about which
filesystem it is using at least when using basic quota functionality. In
particular we should be able to reuse project quota tests in xfstests for ext4.

The patches are based on top of 'for_next' branch of my tree [1] which already
contains quota cleanup series [2] and XFS cleanup series [3] I've sent
previously. The patch series can also be pulled from 'quota_interface' branch
of my tree.

Since the previous version I have addressed all Christoph's comments (dropped
patch for Q_XQUOTASYNC quotactl, moved some code around) and also put the fix
for Q_GETQUOTA vs Q_XGETQUOTA breakage at the beginning of the series and
added CC to stable for it since some users have hit the problem in practice.

The patch series got already reviewed by Christoph (thanks!) upto unification
of Q_GETXSTAT[EV] with Q_GETINFO so if noone objects in a few days, I'll push
the reviewed part of the series to my tree in the second half of this week (so
that it gets some exposure in linux-next before the merge window).

Review of the remaining parts is welcome!


[1] git://
(Continue reading)

Alireza Haghdoost | 25 Jan 21:32 2015

Reset /sys/block/≤dev>/stat

Is there anyway to reset the stat counters (like 'number of read I/Os
processed' ) in /sys/block/≤dev>/stat at run time ?

To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

Theodore Ts'o | 25 Jan 04:02 2015

[PATCH-v8 0/3] add support for a lazytime mount option

This is an updated version of what had originally been an
ext4-specific patch which significantly improves performance by lazily
writing timestamp updates (and in particular, mtime updates) to disk.
The in-memory timestamps are always correct, but they are only written
to disk when required for correctness.

This provides a huge performance boost for ext4 due to how it handles
journalling, but it's valuable for all file systems running on flash
storage or drive-managed SMR disks by reducing the metadata write
load.  So upon request, I've moved the functionality to the VFS layer.
Once the /sbin/mount program adds support for MS_LAZYTIME, all file
systems should be able to benefit from this optimization.

There is still an ext4-specific optimization, which may be applicable
for other file systems which store more than one inode in a block, but
it will require file system specific code.  It is purely optional,

For people interested seeing how timestamp updates are held back, the
following example commands to enable the tracepoints debugging may be

  mount -o remount,lazytime /
  cd /sys/kernel/debug/tracing
  echo 1 > events/writeback/writeback_lazytime/enable
  echo 1 > events/writeback/writeback_lazytime_iput/enable
  echo "state & 2048" > events/writeback/writeback_dirty_inode_enqueue/filter
  echo 1 > events/writeback/writeback_dirty_inode_enqueue/enable
  echo 1 > events/ext4/ext4_other_inode_update_time/enable
  cat trace_pipe
(Continue reading)

Al Viro | 25 Jan 03:39 2015

[git pull] vfs.git fixes

A couple of fixes - deadlock in CIFS and build breakage in cris serial
driver (resurfaced f_dentry in there).  Please, pull from
git:// for-linus

Al Viro (1):
      fix deadlock in cifs_ioctl_clone()

David Howells (1):
      VFS: Convert file->f_dentry->d_inode to file_inode()

 arch/cris/arch-v32/drivers/sync_serial.c |  2 +-
 fs/cifs/ioctl.c                          | 21 +++++----------------
 2 files changed, 6 insertions(+), 17 deletions(-)
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

Kinglong Mee | 24 Jan 10:06 2015

[PATCH] f2fs: fix a bug of inheriting default ACL from parent

Introduced by a6dda0e63e97122ce9e0ba04367e37cca28315fa
"f2fs: use generic posix ACL infrastructure".

When testing default acl, gets in recent kernel (3.19.0-rc5),
# setfacl -dm g:root:rwx test/
# getfacl test/
# file: test/
# owner: root
# group: root

# cd test/
# mkdir testdir
]# getfacl testdir/
# file: testdir/
# owner: root
# group: root
                // missing an acl "group:root:rwx" inherited from parent
(Continue reading)

Steve French | 23 Jan 06:40 2015

[LSF] Topics for discussion

NFS and Samba/SMB3 requirements

There are various topics of common interest to network file systems
(and probably some cluster file systems as well) - common to both
NFSv4.x and SMB3, and some are problems for both the kernels client
and file servers.

For example

- how to allow full support for leases (including upgrades/downgrades)
and delegations
- whether to allow directory leases/delegations
- faster file copy; the proper common API into copy offload (allow
faster server side copy via either T10-like or CopyChunk mechanisms)
- Whether there is value in implementing the new cache/no-cache flags
on read/write, and per-write writethrough flag
- RichACLs (already mentioned as topic for discussion in earlier posts)

Similarly testing of NFSv4.x and SMB3 have many common pain points -
and the recent improvements to xfstests allow more testing of network
file systems, but also show problems common to both NFS an SMB3.



To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
(Continue reading)

Thanos Makatos | 22 Jan 18:13 2015

O_DIRECT not working with vers=3.0

In dca692880e887739a669f6c41a80ca68ce2b09fc I see that ".direct_IO = cifs_direct_io" is added only to
"cifs_addr_ops" but not to "cifs_addr_ops_smallbuf".
Presuming that the only difference between the two structs is the size of the buffer they operate on
(judging by the name), shouldn't ".direct_IO = cifs_direct_io" be added to "cifs_addr_ops_smallbuf"
as well?
In a test environment an open(2) using O_DIRECT didn't work (vers=3.0) but when I added ".direct_IO =
cifs_direct_io" to "cifs_addr_ops_smallbuf" it worked.

Thanos Makatos--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at