Kevin Bowen | 4 Jan 04:28 2010

ext3 resize failed, data loss

I used parted to resize (shrink) an ext3 filesystem and associated
partition, and it buggered my system. The operation completed
apparently successfully, reporting no errors, but after reboot, the fs
wouldn't mount, being marked as having errors, and and e2fsck said
"The filesystem size (according to the superblock) is xxx blocks
The physical size of the device is xxx blocks Either the superblock or
the partition table is likely to be corrupt!". So the fs still thought
it was its original size (larger than its partition).

At this point, the fs would actually mount without errors if I mounted
it manually (ro), and all my data seemed intact, it just thought it
had way more free space than it should have, and it couldn't complete
an fsck (and was obviously not safe to use mounted rw lest it try to
write to space it didn't actually own). Google turned up some accounts
of people with the identical issue, and suggestions to fix it by
writing a new superblock with e2sck -S, then fscking - I did this, and
it totally trashed my filesystem. The fs is now the right size and
mounts fine, but everything just got dumped into lost+found.

Is there any way I can fix this and get my data back? At least get it
back to its previous state so I can mount it ro and copy my data off?
Is my old superblock backed up somewhere, or does e2fsk update the
backup superblock as well? Would my old superblock even help, or did
the fsck trash my inode structure?

Currently I think I have all my data, just dumped in lost+found
without filenames - is there any way to salvage anything from that?

And is this a known bug in ext2resize? In parted?

(Continue reading)

Chris Mason | 4 Jan 17:27 2010
Picon

Re: [Jfs-discussion] benchmark results

On Fri, Dec 25, 2009 at 11:11:46AM -0500, tytso <at> mit.edu wrote:
> On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote:
> > > [1] http://samba.org/ftp/tridge/dbench/README
> > 
> > Was not able to resist to write a small notice, what no matter what, but
> > whatever benchmark is running, it _does_ show system behaviour in one
> > or another condition. And when system behaves rather badly, it is quite
> > a common comment, that benchmark was useless. But it did show that
> > system has a problem, even if rarely triggered one :)
> 
> If people are using benchmarks to improve file system, and a benchmark
> shows a problem, then trying to remedy the performance issue is a good
> thing to do, of course.  Sometimes, though the case which is
> demonstrated by a poor benchmark is an extremely rare corner case that
> doesn't accurately reflect common real-life workloads --- and if
> addressing it results in a tradeoff which degrades much more common
> real-life situations, then that would be a bad thing.
> 
> In situations where benchmarks are used competitively, it's rare that
> it's actually a *problem*.  Instead it's much more common that a
> developer is trying to prove that their file system is *better* to
> gullible users who think that a single one-dimentional number is
> enough for them to chose file system X over file system Y.

[ Look at all this email from my vacation...sorry for the delay ]

It's important that people take benchmarks from filesystem developers
with a big grain of salt, which is one reason the boxacle.net results
are so nice.  Steve more than willing to take patches and experiment to
improve a given FS results, but his business is a fair representation of
(Continue reading)

Steven Pratt | 5 Jan 16:31 2010
Picon

Re: [Jfs-discussion] benchmark results

Dave Chinner wrote:
> On Mon, Jan 04, 2010 at 11:27:48AM -0500, Chris Mason wrote:
>   
>> On Fri, Dec 25, 2009 at 11:11:46AM -0500, tytso <at> mit.edu wrote:
>>     
>>> On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote:
>>>       
>>>>> [1] http://samba.org/ftp/tridge/dbench/README
>>>>>           
>>>> Was not able to resist to write a small notice, what no matter what, but
>>>> whatever benchmark is running, it _does_ show system behaviour in one
>>>> or another condition. And when system behaves rather badly, it is quite
>>>> a common comment, that benchmark was useless. But it did show that
>>>> system has a problem, even if rarely triggered one :)
>>>>         
>>> If people are using benchmarks to improve file system, and a benchmark
>>> shows a problem, then trying to remedy the performance issue is a good
>>> thing to do, of course.  Sometimes, though the case which is
>>> demonstrated by a poor benchmark is an extremely rare corner case that
>>> doesn't accurately reflect common real-life workloads --- and if
>>> addressing it results in a tradeoff which degrades much more common
>>> real-life situations, then that would be a bad thing.
>>>
>>> In situations where benchmarks are used competitively, it's rare that
>>> it's actually a *problem*.  Instead it's much more common that a
>>> developer is trying to prove that their file system is *better* to
>>> gullible users who think that a single one-dimentional number is
>>> enough for them to chose file system X over file system Y.
>>>       
>> [ Look at all this email from my vacation...sorry for the delay ]
(Continue reading)

Martin Baum | 6 Jan 12:00 2010
Picon

Optimizing dd images of ext3 partitions: Only copy blocks in use by fs

Hello,

for bare-metal recovery I need to create complete disk images of ext3  
partitions of about 30 servers. I'm doing this by creating  
lvm2-snapshots and then dd'ing the snapshot-device to my backup media.  
(I am aware that backups created by this procedure are the equivalent  
of hitting the power switch at the time the snapshot was taken.)

This works great and avoids a lot of seeks on highly utilized file  
systems. However it wastes a lot of space for disks with nearly empty  
filesystems.

It would be a lot better if I could only read the blocks from raw disk  
that are really in use by ext3 (the rest could be sparse in the  
imagefile created). Is there a way to do this?

I am aware that e2image -r dumps all metadata. Is there a tool that  
does not only dump metadata but also the data blocks? (maybe even in a  
way that avoids seeks by compiling a list of blocks first and then  
reading them in disk-order) If not: Is there a tool I can extend to do  
so / can you point me into the righ direction?

(I tried dumpfs, however it dumps inodes on a per-directory base.  
Skimming through the source I did not see any optimization regarding  
seeks. So on highly populated filesystems dumpfs still is slower than  
full images with dd for me.)

Thanks a lot,
Martin
(Continue reading)

Andreas Dilger | 6 Jan 22:09 2010
Picon

Re: Optimizing dd images of ext3 partitions: Only copy blocks in use by fs

On 2010-01-06, at 04:00, Martin Baum wrote:
> for bare-metal recovery I need to create complete disk images of  
> ext3 partitions of about 30 servers. I'm doing this by creating lvm2- 
> snapshots and then dd'ing the snapshot-device to my backup media. (I  
> am aware that backups created by this procedure are the equivalent  
> of hitting the power switch at the time the snapshot was taken.)
>
> This works great and avoids a lot of seeks on highly utilized file  
> systems. However it wastes a lot of space for disks with nearly  
> empty filesystems.
>
> It would be a lot better if I could only read the blocks from raw  
> disk that are really in use by ext3 (the rest could be sparse in the  
> imagefile created). Is there a way to do this?

You can use "dump" which will read only the in-use blocks, but it  
doesn't create a full disk image.

The other trick that I've used for similar situations is to write a  
file of all zeroes to the filesystem until it is full (e.g. dd if=/dev/ 
zero of=/foo) and then the backup will be able to compress quite  
well.  If the filesystem is in use, you should stop before the  
filesystem is completely full, and also unlink the file right after it  
is created, so in case of trouble the file will automatically be  
unlinked (even after a crash).

> I am aware that e2image -r dumps all metadata. Is there a tool that  
> does not only dump metadata but also the data blocks? (maybe even in  
> a way that avoids seeks by compiling a list of blocks first and then  
> reading them in disk-order) If not: Is there a tool I can extend to  
(Continue reading)

Michael Rubin | 4 Jan 19:57 2010
Picon

Re: [Jfs-discussion] benchmark results

Google is currently in the middle of upgrading from ext2 to a more up
to date file system. We ended up choosing ext4. This thread touches
upon many of the issues we wrestled with, so I thought it would be
interesting to share. We should be sending out more details soon.

The driving performance reason to upgrade is that while ext2 had been "good
enough" for a very long time the metadata arrangement on a stale file
system was leading to what we call "read inflation". This is where we
end up doing many seeks to read one block of data. In general latency
from poor block allocation was causing performance hiccups.

We spent a lot of time with unix standard benchmarks (dbench, compile
bench, et al) on xfs, ext4, jfs to try to see which one was going to
perform the best. In the end we mostly ended up using the benchmarks
to validate our assumptions and do functional testing. Larry is
completely right IMHO. These benchmarks were instrumental in helping
us understand how the file systems worked in controlled situations and
gain confidence from our customers.

For our workloads we saw ext4 and xfs as "close enough" in performance
in the areas we cared about. The fact that we had a much smoother
upgrade path with ext4 clinched the deal. The only upgrade option we
have is online. ext4 is already moving the bottleneck away from the
storage stack for some of our most intensive applications.

It was not until we moved from benchmarks to customer workload that we
were able to make detailed performance comparisons and find bugs in
our implementation.

"Iterate often" seems to be the winning strategy for SW dev. But when
(Continue reading)

Dave Chinner | 5 Jan 01:41 2010

Re: [Jfs-discussion] benchmark results

On Mon, Jan 04, 2010 at 11:27:48AM -0500, Chris Mason wrote:
> On Fri, Dec 25, 2009 at 11:11:46AM -0500, tytso <at> mit.edu wrote:
> > On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote:
> > > > [1] http://samba.org/ftp/tridge/dbench/README
> > > 
> > > Was not able to resist to write a small notice, what no matter what, but
> > > whatever benchmark is running, it _does_ show system behaviour in one
> > > or another condition. And when system behaves rather badly, it is quite
> > > a common comment, that benchmark was useless. But it did show that
> > > system has a problem, even if rarely triggered one :)
> > 
> > If people are using benchmarks to improve file system, and a benchmark
> > shows a problem, then trying to remedy the performance issue is a good
> > thing to do, of course.  Sometimes, though the case which is
> > demonstrated by a poor benchmark is an extremely rare corner case that
> > doesn't accurately reflect common real-life workloads --- and if
> > addressing it results in a tradeoff which degrades much more common
> > real-life situations, then that would be a bad thing.
> > 
> > In situations where benchmarks are used competitively, it's rare that
> > it's actually a *problem*.  Instead it's much more common that a
> > developer is trying to prove that their file system is *better* to
> > gullible users who think that a single one-dimentional number is
> > enough for them to chose file system X over file system Y.
> 
> [ Look at all this email from my vacation...sorry for the delay ]
> 
> It's important that people take benchmarks from filesystem developers
> with a big grain of salt, which is one reason the boxacle.net results
> are so nice.  Steve more than willing to take patches and experiment to
(Continue reading)

Casey Allen Shobe | 11 Jan 02:03 2010

Re: [Jfs-discussion] benchmark results

On Dec 25, 2009, at 11:22 AM, Larry McVoy wrote:
> Dudes, sync() doesn't flush the fs cache, you have to unmount for  
> that.
> Once upon a time Linux had an ioctl() to flush the fs buffers, I used
> it in lmbench.

You do not need to unmount - 2.6.16+ have a mechanism in /proc to  
flush caches.  See http://linux-mm.org/Drop_Caches

Cheers,
--

-- 
Casey Allen Shobe
casey <at> shobe.info

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Larry McVoy | 11 Jan 02:32 2010

Re: [Jfs-discussion] benchmark results

On Sun, Jan 10, 2010 at 08:03:04PM -0500, Casey Allen Shobe wrote:
> On Dec 25, 2009, at 11:22 AM, Larry McVoy wrote:
>> Dudes, sync() doesn't flush the fs cache, you have to unmount for  
>> that.
>> Once upon a time Linux had an ioctl() to flush the fs buffers, I used
>> it in lmbench.
>
>
> You do not need to unmount - 2.6.16+ have a mechanism in /proc to flush 
> caches.  See http://linux-mm.org/Drop_Caches

Cool, but I tend to come at problems from a cross platform point of view.
Aix no hable /proc :)
--

-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

lakshmi pathi | 13 Jan 10:05 2010
Picon

Fwd: ext4_inode: i_block[] doubt

~~~~~~~~~~~~~~~~
I checked for ext4-users mailing list - but unable to find it and
posted this question to  ext4-beta-list <at> redhat.com - It's seems like
that mailing list is less active.So I'm posting it again to ext3 list
.
~~~~~~~~~~~~~~~~
I was accessing ext4 file using ext2fs lib (from
e2fsprogs-1.41.9-Fedora 12 ) ,while parsing inode contents I got these
output Let me know whether my assumptions are correct?

---------------
//code-part : print inode values
ext2fs_read_inode(current_fs,d->d_ino,&inode);
for ( i = 0; i < 15; i++)

printf ("\ni_block[%d] :%u", i, inode.i_block[i]);

---------------

In struct ext4_inode i_block[EXT4_N_BLOCKS], i_block[0] to i_block[2]
denotes extent headers and tells whether this inode uses Htree for
storing files data blocks.

//output
i_block[0] :324362
i_block[1] :4
i_block[2] :0

//remaining i_block[3] to i_block[14] holds four extents  in following format
//{extent index,number of blocks in extents,starting block number}
(Continue reading)


Gmane