Mag Gam | 1 Sep 2008 19:18
Picon

dynamic inode allocation

This maybe a newbie question but how come other file systems such as
ReiserFS and Veritas' Vxfs dynamically allocate inodes and filesystems
such as ext2/ext3 and JFS we need to allocate them when creating the
filesystem? Is there a performance or maintenance gain when pre
allocating?

TIA
Theodore Tso | 1 Sep 2008 20:37
Picon
Picon
Favicon
Gravatar

Re: dynamic inode allocation

On Mon, Sep 01, 2008 at 01:18:31PM -0400, Mag Gam wrote:
> This maybe a newbie question but how come other file systems such as
> ReiserFS and Veritas' Vxfs dynamically allocate inodes and filesystems
> such as ext2/ext3 and JFS we need to allocate them when creating the
> filesystem? Is there a performance or maintenance gain when pre
> allocating?

Having a static inode table is definitely much simpler than a dynamic
inode table, and that's why ext2 originally used a static inode
allocation system.  Ext2 drew much of its initial design inspiration
from the BSD Fast Filesystem, and it (along with most traditional Unix
filesystems) used a static inode table.  

One of the advantages of having a static inode table is you can always
reliably find it.  With a dynamic inode table, it can often be much
more difficult to find it in the face of filesystem corruption, caused
by either hardware or software failure.  For example, with Reiserfs,
the inodes are stored in a B-Tree.  If the root node, or a relatively
high-level node of the B-tree is lost, the only way to recover all of
the inodes is by looking at each block, and trying to determine if it
"looks" like part of the filesystem B-tree or not.  This is what the
reiserfs's fsck program will do if the filesystem is sufficiently
damaged.  Unfortuntaely, this means that if you store reiserfs
filesystem image (for example, for use by vmware, or qemu, or kvm, or
xen) in a reiserfs filesystem, and the filesystem gets damaged, the
recovery procedure will take every single block that looks like it
could have been part Reiserfs B-tree, and stich them together into a
new-btree.  The result, if you have Reiserfs filesystem images is
those blocks will get treated as if they were part of the containing
filesystem, and the result is not pretty.
(Continue reading)

Mag Gam | 1 Sep 2008 22:29
Picon

Re: dynamic inode allocation

On Mon, Sep 1, 2008 at 2:37 PM, Theodore Tso <tytso <at> mit.edu> wrote:
> On Mon, Sep 01, 2008 at 01:18:31PM -0400, Mag Gam wrote:
>> This maybe a newbie question but how come other file systems such as
>> ReiserFS and Veritas' Vxfs dynamically allocate inodes and filesystems
>> such as ext2/ext3 and JFS we need to allocate them when creating the
>> filesystem? Is there a performance or maintenance gain when pre
>> allocating?
>
> Having a static inode table is definitely much simpler than a dynamic
> inode table, and that's why ext2 originally used a static inode
> allocation system.  Ext2 drew much of its initial design inspiration
> from the BSD Fast Filesystem, and it (along with most traditional Unix
> filesystems) used a static inode table.
>
> One of the advantages of having a static inode table is you can always
> reliably find it.  With a dynamic inode table, it can often be much
> more difficult to find it in the face of filesystem corruption, caused
> by either hardware or software failure.  For example, with Reiserfs,
> the inodes are stored in a B-Tree.  If the root node, or a relatively
> high-level node of the B-tree is lost, the only way to recover all of
> the inodes is by looking at each block, and trying to determine if it
> "looks" like part of the filesystem B-tree or not.  This is what the
> reiserfs's fsck program will do if the filesystem is sufficiently
> damaged.  Unfortuntaely, this means that if you store reiserfs
> filesystem image (for example, for use by vmware, or qemu, or kvm, or
> xen) in a reiserfs filesystem, and the filesystem gets damaged, the
> recovery procedure will take every single block that looks like it
> could have been part Reiserfs B-tree, and stich them together into a
> new-btree.  The result, if you have Reiserfs filesystem images is
> those blocks will get treated as if they were part of the containing
(Continue reading)

Theodore Tso | 1 Sep 2008 22:39
Picon
Picon
Favicon
Gravatar

Re: dynamic inode allocation

On Mon, Sep 01, 2008 at 04:29:06PM -0400, Mag Gam wrote:
> 
> So, if a reiserFs filesystem is damaged and it naturally do a fsck.
> The fsck basically recreated the b-tree by scanning from 1 to end of
> the filesystem?

If the filesystem is sufficiently damaged such that portions of the
b-tree can't be found, then yes.  Otherwise, the data would be totally
lost.  As you can imagine, scaning every single block on the disk to
see if it looks like filesystem metadata is quite slow, so naturally
the reiserfs's fsck will avoid doing it if at all possible.  But if
the root or top-level nodes of the B-tree is damaged, it doesn't have
much choice.

						- Ted
Mag Gam | 1 Sep 2008 23:16
Picon

Re: dynamic inode allocation

On Mon, Sep 1, 2008 at 4:39 PM, Theodore Tso <tytso <at> mit.edu> wrote:
> On Mon, Sep 01, 2008 at 04:29:06PM -0400, Mag Gam wrote:
>>
>> So, if a reiserFs filesystem is damaged and it naturally do a fsck.
>> The fsck basically recreated the b-tree by scanning from 1 to end of
>> the filesystem?
>
> If the filesystem is sufficiently damaged such that portions of the
> b-tree can't be found, then yes.  Otherwise, the data would be totally
> lost.  As you can imagine, scaning every single block on the disk to
> see if it looks like filesystem metadata is quite slow, so naturally
> the reiserfs's fsck will avoid doing it if at all possible.  But if
> the root or top-level nodes of the B-tree is damaged, it doesn't have
> much choice.
>
>                                                - Ted
>
>

But, if thats the last and worst case scenario why don't they do the
full scan? Sure its going to take a long time if its a big filesystem
(there should be no changes since it would be unmounted), but its
better than not having any data at all...
Theodore Tso | 1 Sep 2008 23:23
Picon
Picon
Favicon
Gravatar

Re: dynamic inode allocation

On Mon, Sep 01, 2008 at 05:16:01PM -0400, Mag Gam wrote:
> > If the filesystem is sufficiently damaged such that portions of the
> > b-tree can't be found, then yes.  Otherwise, the data would be totally
> > lost.  As you can imagine, scaning every single block on the disk to
> > see if it looks like filesystem metadata is quite slow, so naturally
> > the reiserfs's fsck will avoid doing it if at all possible.  But if
> > the root or top-level nodes of the B-tree is damaged, it doesn't have
> > much choice.
> >
> 
> But, if thats the last and worst case scenario why don't they do the
> full scan? Sure its going to take a long time if its a big filesystem
> (there should be no changes since it would be unmounted), but its
> better than not having any data at all...

As I said, in the worst case, it will do a full scan.  But (a) it
takes a long time, and (b) if the filesystem has any files that
contain images of reiserfs filesystem, it will be totally scrambled.
So it makes sense that the reiserfs fsck would try to avoid this if it
can (i.e., if the b-tree is only mildly corrupted).

With that said, this is really going out of scope of this mailing
list.  And I am not an expert on reiserfs's filesystem checker,
although I have had people confirm to me that indeed, you can lose
really big if your reiserfs filesystem contains files that have are
images of other reiserfs filesystems for things like Virtualization.
This problem is apparently solved in reiser4, it is NOT solved in
reiserfs (i.e., version 3).  As far as I am concerned, that's ample
reason not to use reiserfs, but obviously I'm basied.  :-)

(Continue reading)

Mag Gam | 1 Sep 2008 23:47
Picon

Re: dynamic inode allocation

Thanks!

This has cured my curiosity (for now...)

On Mon, Sep 1, 2008 at 5:23 PM, Theodore Tso <tytso <at> mit.edu> wrote:
> On Mon, Sep 01, 2008 at 05:16:01PM -0400, Mag Gam wrote:
>> > If the filesystem is sufficiently damaged such that portions of the
>> > b-tree can't be found, then yes.  Otherwise, the data would be totally
>> > lost.  As you can imagine, scaning every single block on the disk to
>> > see if it looks like filesystem metadata is quite slow, so naturally
>> > the reiserfs's fsck will avoid doing it if at all possible.  But if
>> > the root or top-level nodes of the B-tree is damaged, it doesn't have
>> > much choice.
>> >
>>
>> But, if thats the last and worst case scenario why don't they do the
>> full scan? Sure its going to take a long time if its a big filesystem
>> (there should be no changes since it would be unmounted), but its
>> better than not having any data at all...
>
> As I said, in the worst case, it will do a full scan.  But (a) it
> takes a long time, and (b) if the filesystem has any files that
> contain images of reiserfs filesystem, it will be totally scrambled.
> So it makes sense that the reiserfs fsck would try to avoid this if it
> can (i.e., if the b-tree is only mildly corrupted).
>
> With that said, this is really going out of scope of this mailing
> list.  And I am not an expert on reiserfs's filesystem checker,
> although I have had people confirm to me that indeed, you can lose
> really big if your reiserfs filesystem contains files that have are
(Continue reading)

thorsten.henrici | 2 Sep 2008 22:03
Picon

Thorsten Henrici ist außer Haus.


Ich werde ab  27.08.2008 nicht im Büro sein. Ich kehre zurück am
22.09.2008.

Ich werde Ihre Nachricht nach meiner Rückkehr beantworten. In dringenden
Fällen wenden Sie sich bitte an Herrn Stöver.

I'm out of office until the 22th of September. In urgent cases please
contact Mr. Karl-Heinz Stöver.

--
IMPORTANT NOTICE:
This email is confidential, may be legally privileged, and is for the
intended recipient only. Access, disclosure, copying, distribution, or
reliance on any of it by anyone else is prohibited and may be a criminal
offence. Please delete if obtained in error and email confirmation to the sender.
Theodore Tso | 3 Sep 2008 15:45
Picon
Picon
Favicon
Gravatar

Re: spd_readdir.c and readdir_r [real new version]

Hey Ross,

Sorry for not responding early; I was travelling a lot over the
summer, and I never got around to responding to your e-mail.

Many thanks for adding support for readdir_r and readdir64_r!  As it
turns out, I was doing some updates to spd_readdir.c to support
fdopendir (which rm uses).  Also, it looks like you based your changes
off of an older version of spd_readdir.c that didn't support the
dirfd() call.  I probably will try to package this up into its own
package, since I suspect it would be useful to a larger set of people.

In any case here's the merged version I have.  Please let me know if
this works for you, and if you have any other suggested improvements!

	       	 	    	     	       	 - Ted

Attachment (spd_readdir.c): text/x-csrc, 10 KiB
_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
Theodore Tso | 3 Sep 2008 18:09
Picon
Picon
Favicon
Gravatar

Re: Problem in HTREE directory node

On Mon, Aug 25, 2008 at 11:40:06AM -0700, Ross Boylan wrote:
> Short version: 
> 
> fsck said
> "invalid HTREE directory inode 635113
> (mail/r/user/ross/comp/admin-wheat) clear HTREE index?" To which I
> replied Yes.  
> 
> What exactly does this mean was corrupted?  In particular, does it mean
> the list of files in the directory .../comp/admin-wheat was damaged?  Or
> is the trouble in the comp directory?
> 
> Is fsck likely to have fixed up things as good as new, or might
> something be lost or corrupted?  I don't know what clearing the HTREE
> index does.

That just means that the interior nodes in the HTREE were corrupt.  If
you give permission to clear the htree index, e2fsck put the inode on
the list of directories that need to have their HTREE indexes rebuilt,
and a "Pass 3A" will rebuild the directory's (or directories') HTREE
indexes.  This is similar to what "e2fsck -fD" does, except it only
rebuilds directories whose HTREE indexes were corrupted, instead of
rebuilding and optimize all of the directories in the system.

So if that was the only message you received, and there were no other
reports of damage to the directory, you wouldn't have lost any
directory names.  It's in all likelihood "good as new".

Regards,

(Continue reading)


Gmane