Bob Peterson | 22 Oct 03:13 2014

[PATCH] GFS2: Speed up fiemap function by skipping holes


This is the GFS2 companion patch to the one I previously posted for fiemap.

Patch description:
This patch detects the new want_holesize bit in block_map requests.
If a hole is found during fiemap, it calculates the size of the hole
based on the current metapath information, then it sets the new
buffer_got_holesize bit and returns the hole size in b_size.
Since the metapath only represents a section of the file, it can
only extrapolate to a certain size based on the current metapath
buffers. Therefore, fiemap may call blockmap several times to get
the hole size. The hole size is determined by a new function.


Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson <rpeterso <at>> 
 fs/gfs2/bmap.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index f0b945a..450ea17 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
 <at>  <at>  -587,6 +587,62  <at>  <at>  static int gfs2_bmap_alloc(struct inode *inode, const sector_t lblock,
(Continue reading)

Bob Peterson | 22 Oct 03:09 2014

[PATCH][try6] VFS: new want_holesize and got_holesize buffer_head flags for fiemap


This is my sixth rework of this patch. The problem with the previous
version is that the underlying file system's block_map function
may set the buffer_head's b_state to 0, which may be misinterpreted
by fiemap as meaning it has returned the hole size in b_size.
This version implements a suggestion from Steve Whitehouse: it
improves on the design by (unfortunately) using two flags: a flag to
tell block_map to return hole size if possible, and another flag to
tell fiemap that the hole size has been returned. This way there is
no possibility of a misunderstanding.

The problem:
If you do a fiemap operation on a very large sparse file, it can take
an extremely long amount of time (we're talking days here) because
function __generic_block_fiemap does a block-for-block search when it
encounters a hole.

The solution:
Allow the underlying file system to return the hole size so that
function __generic_block_fiemap can quickly skip the hole. I have a
companion patch to GFS2 that takes advantage of the new flags to
speed up fiemap on sparse GFS2 files. Other file systems can do the
same as they see fit. For GFS2, the time it takes to skip a 1PB hole
in a sparse file goes from days to milliseconds.

Patch description:
This patch changes function __generic_block_fiemap so that it sets a new
buffer_want_holesize bit. The new bit signals to the underlying file system
to return a hole size from its block_map function (if possible) in the
(Continue reading)

Jan Kara | 21 Oct 17:12 2014

[PATCH] seq_file: Remove pointless assignment in seq_read()

The value assigned to 'err' in seq_read() is overwritten by the result
of copy_to_user(). This is correct because we know we have succeeded to
generate at least one entry into user buffer so the error we got when
generating further entries is irrelevant. Just remove the assignment.

Coverity-id: 1226981
Signed-off-by: Jan Kara <jack <at>>
 fs/seq_file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 3857b720cb1b..cf53252e4784 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
 <at>  <at>  -262,10 +262,8  <at>  <at>  Fill:
 		size_t offs = m->count;
 		loff_t next = pos;
 		p = m->op->next(m, p, &next);
-		if (!p || IS_ERR(p)) {
-			err = PTR_ERR(p);
+		if (!p || IS_ERR(p))
-		}
 		err = m->op->show(m, p);
 		if (seq_overflow(m) || err) {
 			m->count = offs;


(Continue reading)

Al Viro | 21 Oct 03:13 2014

[RFC] lustre treatment of dentry->d_name

	a) what protects ->d_name in ll_intent_file_open()?  It copies
-> and ->d_name.len into local variables and proceeds to
use those; what's to guarantee that dentry won't get hit with d_move()
halfway through that?  None of the locks that would give an exclusion
against d_move() appear to be held...

	b) what stabilizes *dentryp->d_name in do_statahead_enter()?

	c) what stabilizes fdentry->d_parent->d_name in llog_lvfs_destroy()?

Unless I'm missing something subtle, all three can race with d_move(),
with obvious unpleasant results.  The next bunch doesn't (the callers
are holding ->i_mutex on parents), but it's also bloody odd - why are we
playing these games in llite/namei.c?

static int ll_mkdir(struct inode *dir, struct dentry *dentry, ll_umode_t mode)
        return ll_mkdir_generic(dir, &dentry->d_name, mode, dentry);
static int ll_mkdir_generic(struct inode *dir, struct qstr *name,
                            int mode, struct dentry *dchild)

        int err;

        CDEBUG(D_VFSTRACE, "VFS Op:name=%.*s,dir=%lu/%u(%p)\n",
               name->len, name->name, dir->i_ino, dir->i_generation, dir);

        if (!IS_POSIXACL(dir) || !exp_connect_umask(ll_i2mdexp(dir)))
                mode &= ~current_umask();
(Continue reading)

Mike Frysinger | 20 Oct 01:03 2014

[PATCH 1/2] binfmt_misc: add comments & debug logs

When trying to develop a custom format handler, the errors returned all
effectively get bucketed as EINVAL with no kernel messages.  The other
errors (ENOMEM/EFAULT) are internal/obvious and basic.  Thus any time a
bad handler is rejected, the developer has to walk the dense code and
try to guess where it went wrong.  Needing to dive into kernel code is
itself a fairly high barrier for a lot of people.

To improve this situation, let's deploy extensive pr_debug markers at
logical parse points, and add comments to the dense parsing logic.  It
let's you see exactly where the parsing aborts, the string the kernel
received (useful when dealing with shell code), how it translated the
buffers to binary data, and how it will apply the mask at runtime.

Some example output:
$ echo
> register
$ dmesg
binfmt_misc: register: received 92 bytes
binfmt_misc: register: delim: 0x3a {:}
binfmt_misc: register: name: {qemu-foo}
binfmt_misc: register: type: M (magic)
binfmt_misc: register: offset: 0x0
binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c  \x7fELF\xAD\xAD\
binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00                          x01\x00.
binfmt_misc: register:  mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66  \xff\xff\xff\xff
binfmt_misc: register:  mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30  \xff\x00\xff\x00
binfmt_misc: register:  mask[raw]: 00                                               .
binfmt_misc: register: magic/mask length: 8
binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00                          .ELF....
(Continue reading)

Boaz Harrosh | 19 Oct 19:26 2014

[PATCH 1/2] MAINTAINERS: Change Boaz Harrosh's email

From: Boaz Harrosh <ooo <at>>

I have moved on, and do no longer have Panasas email access.
Update to an email that can reach me.

So change bharrosh <at> => ooo <at>

Explain of email address:
* is a domain owned by me.
* ooo - Stands for Open Osd . Org

Another email alias that can be used is:
	openosd <at>

CC: Greg KH <gregkh <at>>
Signed-off-by: Boaz Harrosh <ooo <at>>
 1 file changed, 1 insertion(+), 1 deletion(-)

index f10ed39..fc2c9a8 100644
 <at>  <at>  -6725,7 +6725,7  <at>  <at>  S:	Orphan
 F:	drivers/net/wireless/orinoco/
-M:	Boaz Harrosh <bharrosh <at>>
+M:	Boaz Harrosh <ooo <at>>
(Continue reading)

Hassen Gnaba | 17 Oct 17:54 2014


Strictly Confidential

My Name is Barrister Hassen Gnaba, I know that you will be very
surprised considering the fact that you do not know me in person and
we have not met before, but do not worry, I came across you while
searching for my late  client’s  relatives who was your distant
relative, Kindly get back to me with your contact details for more
information to my private email address: hgnaba <at>

Hassen Gnaba Esq.
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at>
More majordomo info at

Namjae Jeon | 17 Oct 13:27 2014

[PATCH v6 4/4] Documentation/filesystems/vfat.txt: update the limitation for fat fallocate

Update the limitation for fat fallocate.

Signed-off-by: Namjae Jeon <namjae.jeon <at>>
Signed-off-by: Amit Sahrawat <a.sahrawat <at>>
 Documentation/filesystems/vfat.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index ce1126a..223c321 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
 <at>  <at>  -180,6 +180,16  <at>  <at>  dos1xfloppy  -- If set, use a fallback default BIOS Parameter Block

 <bool>: 0,1,yes,no,true,false

+* The fallocated region of file is discarded at umount/evict time
+  when using fallocate with FALLOC_FL_KEEP_SIZE.
+  So, User should assume that fallocated region can be discarded at
+  last close if there is memory pressure resulting in eviction of
+  the inode from the memory. As a result, for any dependency on
+  the fallocated region, user should make sure to recheck fallocate
+  after reopening the file.
 * Need to get rid of the raw scanning stuff.  Instead, always use

(Continue reading)

Namjae Jeon | 17 Oct 13:26 2014

[PATCH v6 3/4] fat: permit to return phy block number by fibmap in fallocated region

Make the fibmap call the return the proper physical block number for any
offset request in the fallocated range.

Signed-off-by: Namjae Jeon <namjae.jeon <at>>
Signed-off-by: Amit Sahrawat <a.sahrawat <at>>
 fs/fat/cache.c | 38 +++++++++++++++++++++++++-------------
 fs/fat/fat.h   |  3 +++
 fs/fat/inode.c | 33 ++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 14 deletions(-)

diff --git a/fs/fat/cache.c b/fs/fat/cache.c
index 91ad9e1..d3dd5ba 100644
--- a/fs/fat/cache.c
+++ b/fs/fat/cache.c
 <at>  <at>  -303,6 +303,29  <at>  <at>  static int fat_bmap_cluster(struct inode *inode, int cluster)
 	return dclus;

+int fat_get_mapped_cluster(struct inode *inode, sector_t sector,
+			   sector_t last_block,
+			   unsigned long *mapped_blocks, sector_t *bmap)
+	struct super_block *sb = inode->i_sb;
+	struct msdos_sb_info *sbi = MSDOS_SB(sb);
+	int cluster, offset;
+	cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits);
+	offset  = sector & (sbi->sec_per_clus - 1);
+	cluster = fat_bmap_cluster(inode, cluster);
(Continue reading)

Namjae Jeon | 17 Oct 13:26 2014

[PATCH v6 2/4] fat: skip cluster allocation on fallocated region

Skip new cluster allocation after checking i_blocks limit in _fat_get_block,
because the blocks are already allocated in fallocated region.

Signed-off-by: Namjae Jeon <namjae.jeon <at>>
Signed-off-by: Amit Sahrawat <a.sahrawat <at>>
 fs/fat/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index acb45ce..20e9fe5 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
 <at>  <at>  -144,7 +144,12  <at>  <at>  static inline int __fat_get_block(struct inode *inode, sector_t iblock,

 	offset = (unsigned long)iblock & (sbi->sec_per_clus - 1);
-	if (!offset) {
+	/*
+	 * allocate a cluster according to the following.
+	 * 1) no more available blocks
+	 * 2) not part of fallocate region
+	 */
+	if (!offset && !(iblock < (sector_t)inode->i_blocks)) {
 		/* TODO: multiple cluster allocation would be desirable. */
 		err = fat_add_cluster(inode);
 		if (err)


(Continue reading)

Namjae Jeon | 17 Oct 13:26 2014

[PATCH v6 1/4] fat: add fat_fallocate operation

Implement preallocation via the fallocate syscall on VFAT partitions.
This patch is based on an earlier patch of the same name which had some
issues detailed below and did not get accepted.  Refer

a) The preallocated space was not persistent when the
   FALLOC_FL_KEEP_SIZE flag was set.  It will deallocate cluster at evict

b) There was no need to zero out the clusters when the flag was set
   Instead of doing an expanding truncate, just allocate clusters and add
   them to the fat chain.  This reduces preallocation time.

Compatibility with windows:

There are no issues when FALLOC_FL_KEEP_SIZE is not set because it just
does an expanding truncate.  Thus reading from the preallocated area on
windows returns null until data is written to it.

When a file with preallocated area using the FALLOC_FL_KEEP_SIZE was
written to on windows, the windows driver freed-up the preallocated
clusters and allocated new clusters for the new data.  The freed up
clusters gets reflected in the free space available for the partition
which can be seen from the Volume properties.

The windows chkdsk tool also does not report any errors on a disk
containing files with preallocated space.

And there is also no issue using linux fat fsck.  because discard
preallocated clusters at repair time.
(Continue reading)