Linus Torvalds | 18 Dec 19:41 2014

Re: [PATCH] mnt: Fix a memory stomp in umount

On Thu, Dec 18, 2014 at 9:07 AM, Linus Torvalds
<torvalds <at>> wrote:
> Why is this piece of code using its own made up and buggy list handling in
> the first place? We have list functions for these things, exactly so that
> people shouldn't write buggy stuff by hand.

Oh. Ok, I see what's going on. We have "list_splice()", but we don't
have the equivalent "hlist_splice()". So it's doing that by hand, and
did it badly.

Al, this is your bug. I guess I can take the "manual hlist_splice" fix
from Eric, but I'm not really happy with it. There's a few other
places in that same commit where the list splice operation has been

Mind taking a look?

To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

Eric W. Biederman | 18 Dec 17:57 2014

[PATCH] mnt: Fix a memory stomp in umount

While reviewing the code of umount_tree I realized that when we append
to a preexisting unmounted list we do not change pprev of the former
first item in the list.

Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
the former first item of the list will stomp unmounted.first leaving
it set to some random mount point which we are likely to free soon.

This isn't likely to hit, but if it does I don't know how anyone could
track it down.

Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist
Cc: stable <at>
Signed-off-by: "Eric W. Biederman" <ebiederm <at>>

Al do you want to take this one, or would you like me to make certain it
makes it Linus?

 fs/namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index fe1c77145a78..6afbd7bb79f3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
 <at>  <at>  -1370,6 +1370,8  <at>  <at>  void umount_tree(struct mount *mnt, int how)
 	if (last) {
(Continue reading)

Josef Bacik | 18 Dec 17:40 2014

[PATCH] fs: don't softlockup when evicting inodes

If I run an fs_mark job that creates millions of empty files and then
immediately unmount the file system I will get a softlockup during unmount.
This box has ~140gb of RAM so we never hit sufficient memory pressure to evict
enough inodes during the runtime of the benchmark, which means I see around 80
million inodes being evicted at unmount time.  With this patch my box no longer
softlocks up.  Thanks,

Signed-off-by: Josef Bacik <jbacik <at>>
 fs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index ad60555..1a60ed1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
 <at>  <at>  -581,6 +581,7  <at>  <at>  static void dispose_list(struct list_head *head)

+		cond_resched();



To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
(Continue reading)

Jan Kara | 18 Dec 16:37 2014

[PATCH 0/25 RFC v2] quota: Unify VFS and XFS quota interfaces


  this is the second attempt to unify VFS and XFS quota interfaces so that XFS
quotactls work for filesystems with VFS quotas and vice versa.  This is useful
so that userspace doesn't have to care that much about which filesystem it is
using at least when using basic quota functionality. In particular we should be
able to reuse project quota tests in xfstests for ext4.

In this patch set I unify quotaon / quotaoff handling (Q_QUOTAON, Q_QUOTAOFF,
Q_XQUOTAON, Q_XQUOTAOFF calls), fix some bugs in the Q_XGETQUOTA, Q_XSETQLIM,
Q_GETQUOTA, Q_SETQUOTA calls done by Christoph some time ago, unify
Q_GETXSTATE, Q_GETXSTATV with Q_GETINFO, and also wire up Q_SETINFO to work
with XFS (and Q_XSETQLIM for id 0 to modify time limits for VFS quotas).
So after these patches xfs_quota can manipulate quotas for ext4 filesystem and
similarly quota-tools can be used to manipulate quotas for xfs filesystem.

I have also verified that xfstests pass fine both for xfs and ext4 so
hopefully I didn't introduce any regression in the current functionality (I
also did a couple of manual checks for timer setting etc).

All the comments to the first version of the series are hopefully addressed,
except for Dave's request to reduce number of copying of dquot information -
I have benchmarked that reducing number of copies from 3 to 2 brings just 2%
improvement in speed in my test setup and getting quota information isn't IMHO
so performance critical that it would be worth the complications of the code.

Patches are against Linus' tree as of today - I know XFS tree may have some
conflicting changes (I think Dave even picked up some XFS patches from the
beginning of the series) but I wanted the patch series to be self-contained.

(Continue reading)

Jan Kara | 18 Dec 13:49 2014

[PATCH 0/5 v2] fs: Fixes for removing xid bits and security labels


  warning in XFS made me look into detail into how clearing of suid / sgid
bits and security labels is done. And I've spotted a few issues:
1) MS_NOSEC handling is broken - we set it after each file_remove_suid() call.
   However we needn't have removed suid bit simply because we have
   CAP_SYS_FSID and further writes to the file from processes without this
   capability still need to clear the suid bit.
2) file_remove_suid() is a misnomer since it also handles removing of
   security labels. It is even more confusing because should_remove_suid()
   doesn't return whether file_remove_suid() is needed or not.
3) On truncate we do clear suid bits but not security labels. According to
   documentation in include/linux/security.h that's a bug but please correct
   me if I'm wrong.
4) ocfs2 doesn't clear security labels - hard to fix, I left it alone for now.
5) XFS didn't provide proper exclusion for clearing mode bits.

  This series aims at fixing above issues.

  Since v1 I have removed bogus patch changing inode_set_flags(), I have
updated changelog of patch 4/5 to better explain why ->inode_killpriv should
be called and I have included a fix for MS_NOSEC handling in this series.
Al, can you please merge the patches? Thanks!

To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

(Continue reading)

J. Bruce Fields | 17 Dec 20:59 2014

[PATCH] dcache: return -ESTALE not -EBUSY on distributed fs race

From: "J. Bruce Fields" <bfields <at>>

On a distributed filesystem it's possible for lookup to discover that a
directory it just found is already cached elsewhere in the directory
heirarchy.  The dcache won't let us keep the directory in both places,
so we have to move the dentry to the new location from the place we
previously had it cached.

If the parent has changed, then this requires all the same locks as we'd
need to do a cross-directory rename.  But we're already in lookup
holding one parent's i_mutex, so it's too late to acquire those locks in
the right order.

The (unreliable) solution in __d_unalias is to trylock() the required
locks and return -EBUSY if it fails.

I see no particular reason for returning -EBUSY, and -ESTALE is already
the result of some other lookup races on NFS.  I think -ESTALE is the
more helpful error return.  It also allows us to take advantage of the
logic Jeff Layton added in c6a9428401c0 "vfs: fix renameat to retry on
ESTALE errors" and ancestors, which hopefully resolves some of these
errors before they're returned to userspace.

I can reproduce these cases using NFS with:

	ssh root <at> $client '
		mount -olookupcache=pos '$server':'$export' /mnt/
		mkdir /mnt/TO
		mkdir /mnt/DIR
		touch /mnt/DIR/test.txt
(Continue reading)

Chao Yu | 17 Dec 11:10 2014

[f2fs-dev][PATCH v2] f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary

Use more common function ra_meta_pages() with META_POR to readahead node blocks
in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
readahead code there, and also we can remove unused function ra_sum_pages().

changes from v1:
 o fix one bug when using truncate_inode_pages_range which is pointed out by
   Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu <at>>
 fs/f2fs/node.c | 68 +++++++++++++---------------------------------------------
 1 file changed, 15 insertions(+), 53 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 5aa54a0..ab48b4c 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
 <at>  <at>  -1726,80 +1726,42  <at>  <at>  int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page)
 	return 0;

- * ra_sum_pages() merge contiguous pages into one bio and submit.
- * these pre-read pages are allocated in bd_inode's mapping tree.
- */
-static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
-				int start, int nrpages)
-	struct inode *inode = sbi->sb->s_bdev->bd_inode;
-	struct address_space *mapping = inode->i_mapping;
(Continue reading)

Jeremiah Mahler | 16 Dec 12:55 2014

[BUG, linux-next] spawn PID 1 without CLONE_FS, wireless inop


The wireless network interface has become inoperative when running
linux-next 20141216 on a Lenovo Carbon X1.  It is completely
non-existent and `ip addr` doesn't show it.  A bisect has found that
the bug was introduced by the following commit.

  commit 9d328afb18f05c25686102ad890a67bb3ca38aab
  Author: Al Viro <viro <at>>
  Date:   Thu Dec 11 22:34:21 2014 -0500

      spawn PID 1 without CLONE_FS, give kernel threads zero umask

      Don't give PID 1 init_fs, give it a copy of its own when it's
      Then we can make init_fs.umode zero, and have both the PID 1 and
      everything that gets spawned by call_usermodehelper() set
      to old value (0022) early on.

      Signed-off-by: Al Viro <viro <at>>

Below is my network interface information.  And the iwlwifi modules are
being used.

  $ lspci
  03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205
  [Taylor Peak] (rev 96)
(Continue reading)

Jan Kara | 16 Dec 10:53 2014

[GIT PULL] isofs and reiserfs fix

  Hello Linus,

  could you please pull from

git:// for_linus

to get a reiserfs and an isofs fix. They arrived after I sent you my first
pull request and I don't want to delay them unnecessarily till rc2.

Top of the tree is f54e18f1b831. The full shortlog is:

Jan Kara (1):
      isofs: Fix infinite looping over CE entries

Jiri Slaby (1):
      reiserfs: destroy allocated commit workqueue

The diffstat is

 fs/isofs/rock.c     | 6 ++++++
 fs/reiserfs/super.c | 3 +++
 2 files changed, 9 insertions(+)



Jan Kara <jack <at>>
(Continue reading)

Fiedler Roman | 15 Dec 18:39 2014

O_CREAT|O_DIRECTORY on nonexisting file with ext4 not posix-compliant


It seems that the open syscall is not POSIX-compliant when using both
O_CREAT|O_DIRECTORY. This was discussed in [1] with a reference to the POSIX

A simple test program is:

#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>

int main(int argc, char **argv) {
  int fd;
  struct stat statBuf;
  int result;

  fd=open("xxx", O_RDWR|O_CREAT|O_DIRECTORY, 0600);
  result=fstat(fd, &statBuf);
  if(result) {
    fprintf(stderr, "Stat failed\n");
  fprintf(stderr, "New element type is %d\n", S_ISDIR(statBuf.st_mode));

Kind Regards,

(Continue reading)

Masatake YAMATO | 15 Dec 14:30 2014

[PATCH] coredump: use defined macro instead of raw value in filp_open

`2' was hard-coded where O_RDWR can be used.
O_RDWR is better for code reading.

Signed-off-by: Masatake YAMATO <yamato <at>>
 fs/coredump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index b5c86ff..7ec0d2b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
 <at>  <at>  -635,7 +635,7  <at>  <at>  void do_coredump(const siginfo_t *siginfo)

 		cprm.file = filp_open(cn.corename,
-				 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
 		if (IS_ERR(cprm.file))
 			goto fail_unlock;


To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

(Continue reading)