Coly Li | 1 Aug 2011 01:52
Picon

Re: [PATCH 0/2] Add inode checksum support to ext4

On 2011年07月31日 15:08, Joel Becker Wrote:
> On Sat, Jul 30, 2011 at 03:25:32PM +0800, Coly Li wrote:
>> On 2011年07月29日 21:19, Joel Becker Wrote:
>>> On Fri, Jul 29, 2011 at 03:48:45AM -0600, Andreas Dilger wrote:
>>>> On 2011-07-28, at 4:07 PM, Joel Becker wrote:
>>>>> 	We use ethernet crc32 in ocfs2.  btrfs uses crc32c.  Frankly, I
>>>>> could have used crc32c if I'd really thought about the hardware
>>>>> acceleration benefits.  I think it's a good idea for ext4.
>>>>
>>>> The problem with crc32[c] is that if you don't have hardware acceleration
>>>> it is terribly slow.
>>>
>>> 	We find ethernet crc32 just fine in ocfs2.  I use the kernel's
>>> implementation, which survives everyone's network traffic, and of course
>>> we added the triggers to jbd2 so we only have to do the calculations on
>>> read and write.
>>>
>>
>> Ext4 supports non-journal mode, and there are a few users (Google, Taobao, etc.).
>> A trigger of jbd2 may not work well for non-journal Ext4 ...
>>
>> And in non-journal mode, there is not copy of any meta data block in jbd2, we need to be
>> more careful in check summing, e.g. inode/block bitmap blocks...
> 
> 	Sure, but you could use a trigger in journaled mode and then do
> the checksums directly in the __ext4_handle_journal_dirty_*() functions
> in non-journaled mode.  Sure, it would be a little more CPU time, but
> the user picked "checksums + no journal" at mkfs time.
> 

(Continue reading)

Eric Whitney | 1 Aug 2011 04:57
Picon
Favicon

2.6.39 and 3.0 scalability measurement results

I've posted the results of my 2.6.38/2.6.39 and 2.6.39/3.0 ext4 
scalability measurements and comparisons on a 48 core x86_64 server at:

http://free.linux.hp.com/~enw/ext4/2.6.39

http://free.linux.hp.com/~enw/ext4/3.0

The results include throughput and CPU efficiency graphs for five simple 
workloads, the raw data for same, and lockstats as well.

The data cover ext4 filesystems with and without journals.  For 
reference, ext3, xfs, and btrfs are included as well.

The 2.6.38/2.6.39 results mainly show the clear scalability benefit of 
making the mblk_io_submit mount option default behavior for ext4 
filesystems with journals - see the large_file_creates throughput plot.

In the way of more recent news, the 2.6.39/3.0 results indicate little 
change for ext4 either with or without journals.

Thanks,
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Joel Becker | 1 Aug 2011 06:57

Re: [PATCH 0/2] Add inode checksum support to ext4

On Mon, Aug 01, 2011 at 07:52:41AM +0800, Coly Li wrote:
> On 2011年07月31日 15:08, Joel Becker Wrote:
> > On Sat, Jul 30, 2011 at 03:25:32PM +0800, Coly Li wrote:
> >> And in non-journal mode, there is not copy of any meta data block in jbd2, we need to be
> >> more careful in check summing, e.g. inode/block bitmap blocks...
> > 
> > 	Sure, but you could use a trigger in journaled mode and then do
> > the checksums directly in the __ext4_handle_journal_dirty_*() functions
> > in non-journaled mode.  Sure, it would be a little more CPU time, but
> > the user picked "checksums + no journal" at mkfs time.
> > 
> 
> Yes, my idea was similar to you.
> One thing not clear to me is, in non-journal mode, how to make the page of bitmap block being stable. Because bits
> setting in Ext4 bitmap is non-locking, it might be possible that new bit setting after check sum is calculated.

	Every place that changes the bits will eventually call
ext4_journal_dirty(), which recalculates the checksum.  So there's no
danger of a set-bit-after-last-checksum.  But you will have to lock
around the checksum calculation in non-journaling mode.  JBD2 handles it
for journaling mode.

Joel

--

-- 

"The whole principle is wrong; it's like demanding that grown men 
 live on skim milk because the baby can't eat steak."
        - author Robert A. Heinlein on censorship

(Continue reading)

Joel Becker | 1 Aug 2011 07:04

Re: [PATCH 0/2] Add inode checksum support to ext4

On Sun, Jul 31, 2011 at 09:57:11PM -0700, Joel Becker wrote:
> On Mon, Aug 01, 2011 at 07:52:41AM +0800, Coly Li wrote:
> > On 2011年07月31日 15:08, Joel Becker Wrote:
> > > On Sat, Jul 30, 2011 at 03:25:32PM +0800, Coly Li wrote:
> > >> And in non-journal mode, there is not copy of any meta data block in jbd2, we need to be
> > >> more careful in check summing, e.g. inode/block bitmap blocks...
> > > 
> > > 	Sure, but you could use a trigger in journaled mode and then do
> > > the checksums directly in the __ext4_handle_journal_dirty_*() functions
> > > in non-journaled mode.  Sure, it would be a little more CPU time, but
> > > the user picked "checksums + no journal" at mkfs time.
> > > 
> > 
> > Yes, my idea was similar to you.
> > One thing not clear to me is, in non-journal mode, how to make the page of bitmap block being stable.
Because bits
> > setting in Ext4 bitmap is non-locking, it might be possible that new bit setting after check sum is calculated.
> 
> 	Every place that changes the bits will eventually call
> ext4_journal_dirty(), which recalculates the checksum.  So there's no
> danger of a set-bit-after-last-checksum.  But you will have to lock
> around the checksum calculation in non-journaling mode.  JBD2 handles it
> for journaling mode.

	Wait, bitsetting in ext4 can't be non-locking.  Or are they
crazily stomping on memory?  I sure see an assert_spin_locked() in
mb_mark_used().

Joel

(Continue reading)

Joe Perches | 1 Aug 2011 07:42

[PATCH] ext4: Use current logging styles with "EXT4-fs: " prefix

Always use "EXT4-fs: " logging message prefix so dmesg can be more
easily grepped for EXT4 specific messages.

Add #define pr_fmt(fmt) "EXT4-fs: " fmt
Whitespace neatening.
Add ccflags-$(CONFIG-EXT4_FS) to always enable -DDEBUG for pr_debug.
Convert printks to pr_<level>.  Convert some bare printks to pr_cont.
Consolidate split printks.
Coalesce long formats.
Convert embedded function names to "%s", __func__.
Neaten macro definitions that use fmt, arg...
Correct macro definition of ext4_grp_locked_error to add ino and block.
Add do {} while (0) to some macros, delete it from single lines.
Add terminating newlines to a couple of messages.
Correct grammar in a couple of messages.
Add casts to (unsigned long long) in 3 messages in xattr.c

Signed-off-by: Joe Perches <joe <at> perches.com>
---
 fs/ext4/Makefile         |    2 +
 fs/ext4/balloc.c         |   12 ++-
 fs/ext4/block_validity.c |   11 ++-
 fs/ext4/dir.c            |    5 +-
 fs/ext4/ext4.h           |  101 ++++++++++++-----------
 fs/ext4/extents.c        |   55 ++++++-------
 fs/ext4/ialloc.c         |   41 +++++-----
 fs/ext4/inode.c          |   30 ++++---
 fs/ext4/mballoc.c        |  136 ++++++++++++++------------------
 fs/ext4/mballoc.h        |   15 ++--
 fs/ext4/move_extent.c    |   24 +++---
(Continue reading)

Toshiyuki Okajima | 1 Aug 2011 06:54
Favicon

[PATCH] ext3: fix message in ext3_remount for rw-remount case

If there are some inodes in orphan list while a filesystem is being 
read-only mounted, we should recommend that pepole umount and then
mount it when they try to remount with read-write. But the current
message/comment recommends that they umount and then remount it.

ext3_remount:
	/*
	 * If we have an unprocessed orphan list hanging
	 * around from a previously readonly bdev mount,
	 * require a full umount/remount for now.
                          ^^^^^^^^^^^^^^
	 */
	if (es->s_last_orphan) {
		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
			"remount RDWR because of unprocessed "
			"orphan inode list.  Please "
			"umount/remount instead.\n",
                         ^^^^^^^^^^^^^^
			sb->s_id);

Signed-off-by: Toshiyuki Okajima <toshi.okajima <at> jp.fujitsu.com>
---
 fs/ext3/super.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 7beb69a..d3df0d4 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
 <at>  <at>  -2669,13 +2669,13  <at>  <at>  static int ext3_remount (struct super_block * sb, int * flags, char * data)
(Continue reading)

Toshiyuki Okajima | 1 Aug 2011 06:56
Favicon

[PATCH] ext4: fix message in ext4_remount for rw-remount case

If there are some inodes in orphan list while a filesystem is being 
read-only mounted, we should recommend that pepole umount and then
mount it when they try to remount with read-write. But the current
message/comment recommends that they umount and then remount it.

ext4_remount:
	/*
	 * If we have an unprocessed orphan list hanging
	 * around from a previously readonly bdev mount,
	 * require a full umount/remount for now.
                          ^^^^^^^^^^^^^^
	 */
	if (es->s_last_orphan) {
		ext4_msg(sb, KERN_WARNING, "Couldn't "
				"remount RDWR because of unprocessed "
				"orphan inode list.  Please "
				"umount/remount instead");
                                 ^^^^^^^^^^^^^^

Signed-off-by: Toshiyuki Okajima <toshi.okajima <at> jp.fujitsu.com>
---
 fs/ext4/super.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9ea71aa..c518522 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
 <at>  <at>  -4390,13 +4390,13  <at>  <at>  static int ext4_remount(struct super_block *sb, int *flags, char *data)
 			/*
(Continue reading)

Coly Li | 1 Aug 2011 09:16
Picon

Re: [PATCH 0/2] Add inode checksum support to ext4

On 2011年08月01日 13:04, Joel Becker Wrote:
> On Sun, Jul 31, 2011 at 09:57:11PM -0700, Joel Becker wrote:
>> On Mon, Aug 01, 2011 at 07:52:41AM +0800, Coly Li wrote:
>>> On 2011年07月31日 15:08, Joel Becker Wrote:
>>>> On Sat, Jul 30, 2011 at 03:25:32PM +0800, Coly Li wrote:
>>>>> And in non-journal mode, there is not copy of any meta data block in jbd2, we need to be
>>>>> more careful in check summing, e.g. inode/block bitmap blocks...
>>>>
>>>> 	Sure, but you could use a trigger in journaled mode and then do
>>>> the checksums directly in the __ext4_handle_journal_dirty_*() functions
>>>> in non-journaled mode.  Sure, it would be a little more CPU time, but
>>>> the user picked "checksums + no journal" at mkfs time.
>>>>
>>>
>>> Yes, my idea was similar to you.
>>> One thing not clear to me is, in non-journal mode, how to make the page of bitmap block being stable.
Because bits
>>> setting in Ext4 bitmap is non-locking, it might be possible that new bit setting after check sum is calculated.
>>
>> 	Every place that changes the bits will eventually call
>> ext4_journal_dirty(), which recalculates the checksum.  So there's no
>> danger of a set-bit-after-last-checksum.  But you will have to lock
>> around the checksum calculation in non-journaling mode.  JBD2 handles it
>> for journaling mode.
> 
> 	Wait, bitsetting in ext4 can't be non-locking.  Or are they
> crazily stomping on memory?  I sure see an assert_spin_locked() in
> mb_mark_used().
> 

(Continue reading)

Jan Kara | 1 Aug 2011 10:45
Picon

Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case

On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
> If there are some inodes in orphan list while a filesystem is being 
> read-only mounted, we should recommend that pepole umount and then
> mount it when they try to remount with read-write. But the current
> message/comment recommends that they umount and then remount it.
> 
> ext3_remount:
> 	/*
> 	 * If we have an unprocessed orphan list hanging
> 	 * around from a previously readonly bdev mount,
> 	 * require a full umount/remount for now.
>                           ^^^^^^^^^^^^^^
> 	 */
> 	if (es->s_last_orphan) {
> 		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
> 			"remount RDWR because of unprocessed "
> 			"orphan inode list.  Please "
> 			"umount/remount instead.\n",
>                          ^^^^^^^^^^^^^^
> 			sb->s_id);
  OK, so how about using "umount & mount"? The '/' is what would confuse me
the most... BTW, I guess you didn't really see this message in practice, did
you?

									Honza
--

-- 
Jan Kara <jack <at> suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
(Continue reading)

Toshiyuki Okajima | 1 Aug 2011 11:45
Favicon

Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case

Hi.

(2011/08/01 17:45), Jan Kara wrote:
> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
>> If there are some inodes in orphan list while a filesystem is being
>> read-only mounted, we should recommend that pepole umount and then
>> mount it when they try to remount with read-write. But the current
>> message/comment recommends that they umount and then remount it.
>>
>> ext3_remount:
>> 	/*
>> 	 * If we have an unprocessed orphan list hanging
>> 	 * around from a previously readonly bdev mount,
>> 	 * require a full umount/remount for now.
>>                            ^^^^^^^^^^^^^^
>> 	 */
>> 	if (es->s_last_orphan) {
>> 		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
>> 			"remount RDWR because of unprocessed "
>> 			"orphan inode list.  Please "
>> 			"umount/remount instead.\n",
>>                           ^^^^^^^^^^^^^^
>> 			sb->s_id);

>    OK, so how about using "umount&  mount"? The '/' is what would confuse me
OK. I modify it like your comment.

  umount/mount => umount & mount

> the most... BTW, I guess you didn't really see this message in practice, did
(Continue reading)


Gmane