Theodore Ts'o | 1 Jan 07:42 2010
Picon
Picon

[PATCH 1/2] ext4: Fix accounting of reserved metadata blocks

Commit 0637c6f had a typo which caused the reserved metadata blocks to
not be released correctly.   Fix this.

Signed-off-by: "Theodore Ts'o" <tytso <at> mit.edu>
---
 fs/ext4/inode.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84eeb8f..a26a0b4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
 <at>  <at>  -1076,9 +1076,8  <at>  <at>  static void ext4_da_update_reserve_space(struct inode *inode, int used)
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		mdb_free = ei->i_allocated_meta_blocks;
-		percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
-		ei->i_allocated_meta_blocks = 0;
+		mdb_free = ei->i_reserved_meta_blocks;
+		ei->i_reserved_meta_blocks = 0;
 	}
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

 <at>  <at>  -1889,8 +1888,8  <at>  <at>  static void ext4_da_release_space(struct inode *inode, int to_free)
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		to_free += ei->i_allocated_meta_blocks;
-		ei->i_allocated_meta_blocks = 0;
(Continue reading)

Theodore Ts'o | 1 Jan 07:42 2010
Picon
Picon

[PATCH 2/2] ext4: Calculate metadata requirements more accurately

In the past, ext4_calc_metadata_amount(), and its sub-functions
ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
badly over-estimated the number of metadata blocks that might be
required for delayed allocation blocks.  This didn't matter as much
when functions which managed the reserved metadata blocks were more
aggressive about dropping reserved metadata blocks as delayed
allocation blocks were written, but unfortunately they were too
aggressive.  This was fixed in commit 0637c6f, but as a result the
over-estimation by ext4_calc_metadata_amount() would lead to reserving
2-3 times the number of pending delayed allocation blocks as
potentially required metadata blocks.  So if there are 1 megabytes of
blocks which have been not yet been allocation, up to 3 megabytes of
space would get reserved out of the user's quota and from the file
system free space pool until all of the inode's data blocks have been
allocated.

This commit addresses this problem by much more accurately estimating
the number of metadata blocks that will be required.  It will still
somewhat over-estimate the number of blocks needed, since it must make
a worst case estimate not knowing which physical blocks will be
needed, but it is much more accurate than before.

Signed-off-by: "Theodore Ts'o" <tytso <at> mit.edu>
---
 fs/ext4/ext4.h         |    2 +
 fs/ext4/ext4_extents.h |    3 +-
 fs/ext4/extents.c      |   49 ++++++++++++++++++++++++-------------
 fs/ext4/inode.c        |   62 +++++++++++++++++++++++++++--------------------
 fs/ext4/super.c        |    1 +
 5 files changed, 73 insertions(+), 44 deletions(-)
(Continue reading)

Theodore Ts'o | 1 Jan 08:45 2010
Picon
Picon

[PATCH -v2 1/2] ext4: Fix accounting of reserved metadata blocks

Commit 0637c6f had a typo which caused the reserved metadata blocks to
not be released correctly.   Fix this.

Signed-off-by: "Theodore Ts'o" <tytso <at> mit.edu>
---

Oops, I accidentally deleted an extra line that was critically needed in
the earlier -v1 version of this patch.

 fs/ext4/inode.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84eeb8f..bdaa92a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
 <at>  <at>  -1076,9 +1076,9  <at>  <at>  static void ext4_da_update_reserve_space(struct inode *inode, int used)
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		mdb_free = ei->i_allocated_meta_blocks;
+		mdb_free = ei->i_reserved_meta_blocks;
+		ei->i_reserved_meta_blocks = 0;
 		percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
-		ei->i_allocated_meta_blocks = 0;
 	}
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

 <at>  <at>  -1889,8 +1889,8  <at>  <at>  static void ext4_da_release_space(struct inode *inode, int to_free)
 		 * only when we have written all of the delayed
(Continue reading)

Theodore Ts'o | 1 Jan 08:46 2010
Picon
Picon

[PATCH -v2 2/2] ext4: Calculate metadata requirements more accurately

In the past, ext4_calc_metadata_amount(), and its sub-functions
ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
badly over-estimated the number of metadata blocks that might be
required for delayed allocation blocks.  This didn't matter as much
when functions which managed the reserved metadata blocks were more
aggressive about dropping reserved metadata blocks as delayed
allocation blocks were written, but unfortunately they were too
aggressive.  This was fixed in commit 0637c6f, but as a result the
over-estimation by ext4_calc_metadata_amount() would lead to reserving
2-3 times the number of pending delayed allocation blocks as
potentially required metadata blocks.  So if there are 1 megabytes of
blocks which have been not yet been allocation, up to 3 megabytes of
space would get reserved out of the user's quota and from the file
system free space pool until all of the inode's data blocks have been
allocated.

This commit addresses this problem by much more accurately estimating
the number of metadata blocks that will be required.  It will still
somewhat over-estimate the number of blocks needed, since it must make
a worst case estimate not knowing which physical blocks will be
needed, but it is much more accurate than before.

Signed-off-by: "Theodore Ts'o" <tytso <at> mit.edu>
---
 fs/ext4/ext4.h         |    2 +
 fs/ext4/ext4_extents.h |    3 +-
 fs/ext4/extents.c      |   49 ++++++++++++++++++++++++-------------
 fs/ext4/inode.c        |   62 +++++++++++++++++++++++++++--------------------
 fs/ext4/super.c        |    1 +
 5 files changed, 73 insertions(+), 44 deletions(-)
(Continue reading)

bugzilla-daemon | 2 Jan 00:09 2010

[Bug 14972] New: [regression] msync() call on ext4 causes disk thrashing

http://bugzilla.kernel.org/show_bug.cgi?id=14972

           Summary: [regression] msync() call on ext4 causes disk
                    thrashing
           Product: File System
           Version: 2.5
    Kernel Version: 2.6.32.2
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
        AssignedTo: fs_ext4 <at> kernel-bugs.osdl.org
        ReportedBy: t.artem <at> mailcity.com
        Regression: Yes

Created an attachment (id=24398)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=24398)
run this way ./test test.txt 100

The attached application causes useless disk thrashing in case of ext4, but
works normally in case of ext3 (and possibly other FS's).

msync() for unchanged mmap'ed files shouldn't cause disk/HDD activity which I
do observe.

--

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
(Continue reading)

bugzilla-daemon | 2 Jan 00:14 2010

[Bug 14972] [regression] msync() call on ext4 causes disk thrashing

http://bugzilla.kernel.org/show_bug.cgi?id=14972

--- Comment #1 from Artem S. Tashkinov <t.artem <at> mailcity.com>  2010-01-01 23:14:38 ---
Created an attachment (id=24399)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=24399)
.config; dmesg; lspci; hdparm -I

--

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

bugzilla-daemon | 2 Jan 14:19 2010

[Bug 14972] [regression] msync() call on ext4 causes disk thrashing

http://bugzilla.kernel.org/show_bug.cgi?id=14972

Theodore Tso <tytso <at> mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso <at> mit.edu

--- Comment #2 from Theodore Tso <tytso <at> mit.edu>  2010-01-02 13:19:23 ---
Regression from what?  Ext3?  Or some earlier kernel version?

What arguments are you giving to this test program of yours?   It looks like it
does some number of reads and/or writes to the file, and then calls 10,000
msyncs with 50ms wait between each msync.

I'm not seeing any disk activity as a result.   Was anything else reading or
writing to the file or to the file system at the same time?

--

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

bugzilla-daemon | 2 Jan 16:10 2010

[Bug 14972] [regression] msync() call on ext4 causes disk thrashing

http://bugzilla.kernel.org/show_bug.cgi?id=14972

--- Comment #3 from Artem S. Tashkinov <t.artem <at> mailcity.com>  2010-01-02 15:10:03 ---
(In reply to comment #2)
> Regression from what?  Ext3?  Or some earlier kernel version?

Regression from ext3.

> 
> What arguments are you giving to this test program of yours?   It looks like it
> does some number of reads and/or writes to the file, and then calls 10,000
> msyncs with 50ms wait between each msync.

While doing those msync()s mmap file is unchanged, but ...

> 
> I'm not seeing any disk activity as a result.   Was anything else reading or
> writing to the file or to the file system at the same time?

... my HDD led keeps flashing continuously which certainly means some disk
activity. Should I attach a video showing this abnormality on kernel 2.6.32.2
on runlevel 1 with no application running except bash and this application?

--

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)

bugzilla-daemon | 2 Jan 17:47 2010

[Bug 14972] [regression] msync() call on ext4 causes disk thrashing

http://bugzilla.kernel.org/show_bug.cgi?id=14972

--- Comment #4 from Theodore Tso <tytso <at> mit.edu>  2010-01-02 16:47:35 ---
OK, so that's not technically a regression as far as what the kernel bugzilla
field is concerned.

>... my HDD led keeps flashing continuously which certainly means some disk
>activity. Should I attach a video showing this abnormality on kernel 2.6.32.2
>on runlevel 1 with no application running except bash and this application?

What would be much more useful would be to install blktrace, and then attach
the output of "btrace /dev/sdXX" while this application is running.

When I do the test, I am seeing some excess write barriers which we can
optimize away:

254,3    0     1148    31.613064584 10904  Q  WB [test]
254,3    0     1149    31.664270330 10904  Q  WB [test]
254,3    0     1150    31.715259078 10904  Q  WB [test]
254,3    0     1151    31.772662156 10904  Q  WB [test]
254,3    0     1152    31.827932269 10904  Q  WB [test]
254,3    0     1153    31.883122551 10904  Q  WB [test]

but that's not a disaster.   (If there is no pending writes from other
applications, this won't cause any extra hard drive activity.)

I'd like to confirm whether you are seeing anything more, since at least for me
on my system, empty write barriers don't cause the hard drive activity light to
go on.   Maybe it does for your system, though.  I'd like to confirm this since
because if it's just a matter of extra (unnecessary) write barriers, we would
(Continue reading)

bugzilla-daemon | 2 Jan 17:48 2010

[Bug 14972] [regression] msync() call on ext4 causes disk thrashing

http://bugzilla.kernel.org/show_bug.cgi?id=14972

Theodore Tso <tytso <at> mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Regression|Yes                         |No

--

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane