Ken Bass | 24 Apr 2013 02:14
Picon

(LONG) Delay when writing to ext4 LVM after boot

(I previously asked this question in the LVM list, and they suggested I ask here.)

I have a large LV, about 6.5T, consisting of 4 physical drives of various sizes. The LV is formatted as ext4. There is no raid  involved (hardware of software).

After I first boot, if I try to write a large file (>~ 80M) to this LV, the write hangs for about 1minute or more, then continues on at full speed and finishes successfully. Writes of small files don't show this delay. After that first write and delay, all subsequent writes to other large files proceed at full speed.

I am currently running Fedora 17 64bit  (kernel 3.8.4-102.fc17.x86_64) but have noticed this also in previous systems (both 64 and 32bit). With smaller file systems ( < 1T ), there was a delay, but it was small, and it increased significantly as I increased the LV size.

I have run e2fsck with the -D option (before attempting a write), which made no difference. Also, fwiw,  I am mounting this with the default options. I've tried other options that were suggested to tweak ext4, but, again, no effect. This LV is also not my system (root) partition - that is on a separate physical drive.

Any ideas? Suggestions?

(I will gladly supply additional info as requested.)

TIA

ken



_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
aragonx | 26 Mar 2013 20:51

e2freefrag says filesystem too large

Can someone tell me if this will be fixed?

# e2freefrag /dev/sdl1
Device: /dev/sdl1
Blocksize: 4096 bytes
/dev/sdl1: Filesystem too large to use legacy bitmaps while reading block bitmap

# rpm -qa|grep e2fsprogs
e2fsprogs-libs-1.42.5-1.fc18.x86_64
e2fsprogs-1.42.5-1.fc18.x86_64

# df|grep /dev/sdl1
/dev/sdl1                                35143869536 30265426892  4175317880  88% /mnt/backup

Thanks!

---
Will Y.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
Vincent Caron | 13 Mar 2013 00:56

ext4 and extremely slow filesystem traversal

Hello list,

  I have troubles with the daily backup of a modest filesystem which
tends to take more that 10 hours. I have ext4 all over the place on ~200
servers and never ran into such a problem.

  The filesystem capacity is 300 GB (19,6M inodes) with 196 GB (9,3M
inodes) used. It's mounted 'defaults,noatime'. It sits on a hardware
RAID array thru plain LVM slices. The RAID array is a RAID5 running on
5x SATA 500G disks, with a battery-backed (RAM) cache and write-back
cache policy. To be precise, it's an Areca 1231.

  The hardware RAID array use 64kB stripes and I've configured the
filesystem with 4kB blocks and stride=16. It also has 0 reserved blocks.
In other works the fs was created with 'mkfs -t ext4 -E stride=16 -m 0
-L volname /dev/vgX/Y'. I'm attaching the mke2fs.conf for reference too.

  Everything is running with Debian Squeeze and its 2.6.32 kernel (amd64
flavour), on a 4 cores and 4 GB RAM server.

  I ran a tiobench tonight on an idle instance (I have two identicals
systems - hw, sw, data - with exactly the same pb). I've attached
results as plain text to protect them from line wrapping. They look fine
to me.

  When I try to backup the problematic filesystem with tar, rsync or
whatever tool traversing the whole filesystem, things are awful. I know
that this filesystem has *lots* of directories, most with few or no
files in them. Tonight I ran a simple 'find /path/to/vol -type d |pv
-bl' (counts directories as they are found), I stopped it more than 2
hours later : it was not done, and already counted more than 2M
directories. IO stats showed 1000 read calls/sec with avq=1 and avio=5
ms. CPU is 2% so it is totally I/O bound. This looks like the worst
random read case to me.

  I even tried a hack which tries to sort directories while traversing
the filesystem to no avail.

  Right now I don't even know how to analyze my filesystem further.
Sorry for not being able to describe it more accurately. I'm in search
for any advice or direction to improve this situation. While keeping
using ext4 of course :).

  PS: I did ask to the developers to not abuse the filesystem that way,
and that in 2013 it's okay to have 10k+ files per directory... No
success, so I guess I'll have to work around it.

filer:/srv/painfulvol/bench# tiobench --size 10000
Run #1: /usr/bin/tiotest -t 8 -f 1250 -r 500 -b 4096 -d . -T

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1  215.82 42.21%     0.017     1384.29   0.00000  0.00000   511
2.6.32-5-amd64               10000  4096    2  129.51 48.53%     0.057     5115.46   0.00020  0.00000   267
2.6.32-5-amd64               10000  4096    4   89.80 66.26%     0.168     6697.64   0.00043  0.00000   136
2.6.32-5-amd64               10000  4096    8   77.11 113.3%     0.394     6750.12   0.00102  0.00000    68

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1    0.79 0.302%     4.951       58.56   0.00000  0.00000   260
2.6.32-5-amd64               10000  4096    2    0.41 0.328%    17.165      174.55   0.00000  0.00000   126
2.6.32-5-amd64               10000  4096    4    0.80 1.024%    18.848      358.64   0.00000  0.00000    78
2.6.32-5-amd64               10000  4096    8    0.82 1.801%    35.989      808.74   0.00000  0.00000    45

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1  243.70 78.53%     0.014      492.80   0.00000  0.00000   310
2.6.32-5-amd64               10000  4096    2  186.89 150.9%     0.037     1969.62   0.00000  0.00000   124
2.6.32-5-amd64               10000  4096    4  113.90 209.8%     0.122     6303.26   0.00137  0.00000    54
2.6.32-5-amd64               10000  4096    8   88.32 336.6%     0.307     9451.83   0.00285  0.00000    26

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1  107.11 101.4%     0.009        0.06   0.00000  0.00000   106
2.6.32-5-amd64               10000  4096    2  173.32 337.2%     0.010        0.04   0.00000  0.00000    51
2.6.32-5-amd64               10000  4096    4  224.92 921.3%     0.011        0.76   0.00000  0.00000    24
2.6.32-5-amd64               10000  4096    8  206.05 1598.%     0.012        1.00   0.00000  0.00000    13
[defaults]
	base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
	blocksize = 4096
	inode_size = 256
	inode_ratio = 16384

[fs_types]
	ext3 = {
		features = has_journal
	}
	ext4 = {
		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
		inode_size = 256
	}
	ext4dev = {
		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
		inode_size = 256
		options = test_fs=1
	}
	small = {
		blocksize = 1024
		inode_size = 128
		inode_ratio = 4096
	}
	floppy = {
		blocksize = 1024
		inode_size = 128
		inode_ratio = 8192
	}
	news = {
		inode_ratio = 4096
	}
	largefile = {
		inode_ratio = 1048576
		blocksize = -1
	}
	largefile4 = {
		inode_ratio = 4194304
		blocksize = -1
	}
	hurd = {
	     blocksize = 4096
	     inode_size = 128
	}
_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
shubham | 7 Nov 2012 17:41
Picon

Need information about journal abortion and its relation with remounting

Hi Guys,

I was looking at the code of ext3 file system and found some strange 
implementation there :

Can someone please let me know the validity of below statements :

1. I found that it might also happen that journal is aborted but not 
re-mounted
2. Journal gets aborted but it might be possible to mount it in 
read-write mode.
3. Can we write some data on the partition where journal is aborted.

Thanks in advance

Regards
Shubham
Shubham Sharma | 25 Oct 2012 18:25
Picon

Relation between aborted journal and read only file system

Hi All,

This is my first post in this mailing list.

I am a learner in this field.

I have a query here :

1. If ext3 filesystem's journal got curropted then is it necessary thatfilesystem will be mounted read only ?
either YES/NO, I want to know the reason also.

(If I can get some links/document for this then it will be really useful)

2. Also I would like to know that how we can find root cause of ext3 corruption ?

3. In case we found root cause then how to fix and test the code. I think in this case
regression would be required because file system can impact whole system. So , how to be sure about my fix .

Please reply ..

Regards,
Shubham

_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
Jayen | 15 Aug 2012 12:16
Gravatar

bind mounts, chroots, and mismatching kernel with userland

I have a 32-bit debian system, but I recently needed to mmap large
files, so I started using a 64-bit kernel with my 32-bit userland and
installed a 64-bit debian chroot to run inside.  I use bind mounts for
proc, sys, tmp, dev, and home.  It all works fine, but occasionally, the
filesystem gets corrupted (/ and /home are on the same system).  I am
running linux 3.2.21 (3.2.0-3 in debian).

Have I done something I shouldn't do?

I hope this is the right mailing list for this question; I couldn't find
another linux fs user mailing list.

Thanks,
Jayen
Peter Grandi | 7 Aug 2012 00:49
X-Face
Picon

Re: resize too large

>> [ ... ] im on debian squeeze 2.6.32-5-amd64 [ ... ]
>> resize2fs: New size too large to be expressed in 32 bits [
>> ... ] data1 vgRAID6 -wi-ao 18.00t

>> Thanks for letting us know that 'resize2fs' and 'ext3' and
>> the Linux kernel continue to behave as documented.

Someone sent me an email with this question, and the answer may
be useful to others:

> but: where is it documented? [ ... ]

The 'ext3' filesystem is limited to 8/16TiB:

https://ext4.wiki.kernel.org/index.php/Ext4_Howto#Bigger_File_System_and_File_Sizes
 «Currently, Ext3 support 16 TiB of maximum file system size and
  2 TiB of maximum file size.»

(the same information appears in Wikipedia etc.)  and this is
the 'ext3' mailing list, not the 'ext4' mailing list which is
served at another place:

  https://ext4.wiki.kernel.org/index.php/Mailinglists

so I must assume that the original poster was indeed trying to
create an 'ext3' filesystem, and this configuration is thus not
going to take effect anyhow:

> ext4 = {
>  features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
>  auto_64-bit_support = 1 ## ADDED THIS
>  inode_size = 256
> }

It might feel tempting to convert an 'ext3' filesystem to 'ext4'
to escape the 8/16TiB limitation, but even for 'ext4' there is a
resize limit of less than 8/16TiB if the filesystem initial size
was less than 8/16TiB (which it must be if it was initially
'ext3'), because the 'ext4' ondisk layout by default is
compatible then with the 'ext3' ondisk layout if possible and
thus uses 32b offsets by default:

https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Block_Group_Descriptors
 «In ext2, ext3, and ext4 (when the 64bit feature is not enabled),
  the block group descriptor was only 32 bytes long and therefore
  ends at bg_used_dirs_count_lo. On an ext4 filesystem with the
  64bit feature enabled, the block group descriptor expands to the
  full 64 bytes described below.»

http://comments.gmane.org/gmane.comp.file-systems.ext4/33531 It
 «It is possible to format a 32-bit filesystem with larger group
  descriptors using the "-O 64bit" option, but this doesn't
  happen by default today.
  Possibly we should start using the 64-byte group descriptors
  by default for filesystems over, say, 4 TB, so they can be
  resized beyond 16 TB.
  It might also be possible to modify resize2fs to change the
  pgroup descriptor size, but that isn't possible today.»

But the original poster reported:

>> resize2fs 1.41.14 (22-Dec-2010)

and the configuration above is not going to work anyhow as
'e2fsprogs' before 1.42 do not support sizes larger than 8/16TiB
for 'ext4' anyhow:

https://ext4.wiki.kernel.org/index.php/Ext4_Howto#Bigger_File_System_and_File_Sizes
 «Ext4 adds 48-bit block addressing, so it will have 1 EiB[1] of
  maximum file system size and 16 TiB of maximum file size. [
  ... ] NOTE!  The code to create file systems bigger than 16 TiB
  is, at the time of writing this article, not in any stable
  release of e2fsprogs. It will be in future releases.»

http://os1a.cs.columbia.edu/lxr/source/Documentation/filesystems/ext4.txt?v=2.6.32
 «* ability to use filesystems > 16TB (e2fsprogs support not
    available yet)»

http://e2fsprogs.sourceforge.net/e2fsprogs-release.html#1.42
 «E2fsprogs 1.42 (November 29, 2011)
  This release of e2fsprogs has support for file systems > 16TB.
  Online resize requires kernel support which will hopefully be in
  Linux version 3.2»

The above links also appeared in a web search for the phrase
"too large to be expressed in 32 bits" which yields some more
useful links (for 'ext4'):

http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and-the-16-tb-limit-now-solved/
http://forums.debian.net/viewtopic.php?f=5&t=77522

Anyhow I reckon that filesystems significantly larger than
2-6TiB are not a good idea for a number of important reasons,
which however matter only when things go wrong, so most people
don't care, and that resizing is a somewhat dangerous operation
that has performance problems, so overall I would not recommend
going looking for trouble...
william L'Heureux | 4 Aug 2012 15:26
Picon
Favicon

resize too large


I have a file system I am trying to resize via resize2fs but I get this error

resize2fs 1.41.14 (22-Dec-2010)
resize2fs: New size too large to be expressed in 32 bits

im on debian squeeze 2.6.32-5-amd64

# pvs
  PV         VG      Fmt  Attr PSize  PFree
  /dev/md1   vgRAID6 lvm2 a-   18.17t 134.12g

# lvs
  LV    VG      Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  data1 vgRAID6 -wi-ao 18.00t

and the cryptsetup resize worked like a charm. I ran  e2fsck before the resize which pass sucessfully.

I added 

ext4 = {
                features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
                auto_64-bit_support = 1 ## ADDED THIS
                inode_size = 256
        }

to /etc/mke2fs.conf

and run resize2fs and got the same error
 		 	   		  
Jeremy Sanders | 14 Jun 2012 11:11
Gravatar

Filesystem is busy after umount and won't fsck

This is on Fedora 16, kernel 3.3.7-1.fc16.x86_64. sdb1/3 are ext3
devices on a Hitachi HDS721010KLA330.

# fsck /dev/sdb1
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext3: Device or resource busy while trying to open /dev/sdb1
Filesystem mounted or opened exclusively by another program?
# fsck /dev/sdb3
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext3: Device or resource busy while trying to open /dev/sdb3
Filesystem mounted or opened exclusively by another program?
# umount /dev/sdb1
umount: /dev/sdb1: not mounted
# umount /dev/sdb3
umount: /dev/sdb3: not mounted
# grep sdb /proc/mounts

# mount /dev/sdb1 /mnt/tmp
# umount /mnt/tmp
# fsck /dev/sdb1
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext3: Device or resource busy while trying to open /dev/sdb1
Filesystem mounted or opened exclusively by another program?

# lsof|grep sdb
jbd2/sdb1  3835          root  cwd       DIR                8,1     4096
         2 /
jbd2/sdb1  3835          root  rtd       DIR                8,1
4096          2 /
jbd2/sdb1  3835          root  txt   unknown
          /proc/3835/exe
jbd2/sdb3  3858          root  cwd       DIR                8,1
4096          2 /
jbd2/sdb3  3858          root  rtd       DIR                8,1
4096          2 /
jbd2/sdb3  3858          root  txt unknown

/proc/3858/exe

Any idea what is going on? The device is usually fscked, then mounted
nightly, data is rsynced onto it, then unmounted. We'll probably upgrade
it to ext4 soon to see whether the problem goes away.

This has happened twice on this kernel (rebooted after it happened the
first time) and never before.

This is dumpe2fs on /dev/sdb1:
Filesystem volume name:   <none>
Last mounted on:          /mnt/root_backup
Filesystem UUID:          2aeb7822-f81a-4b84-a9fd-76c4c5c27bba
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1835008
Block count:              3664820
Reserved block count:     183241
Free blocks:              1237712
Free inodes:              1545330
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      894
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Thu May 22 15:09:53 2008
Last mount time:          Wed Jun 13 03:47:45 2012
Last write time:          Wed Jun 13 03:47:45 2012
Mount count:              5
Maximum mount count:      28
Last checked:             Sat Jun  9 09:37:27 2012
Check interval:           15552000 (6 months)
Next check after:         Thu Dec  6 08:37:27 2012
Lifetime writes:          87 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      706df78a-3ca8-46e8-a831-5f8ba43d2609
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0011fbd9
Journal start:            1

Thanks

Jeremy
David Shaw | 5 Jun 2012 18:12

Proper stride and stripe-width for RAID 50

Hello,

I've been looking around, but can't seem to find an authoritative statement on setting stride and
stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s)

Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated
the same as it would be if there were no multiple-level RAIDing involved.  For example, given a RAID 50 made
up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride
should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks).  These settings should
work equally well whether the two RAID 5s are striped together or just appended one after the other via LVM.

Is my reasoning correct?

David
Badoo | 19 Apr 2012 10:00
Favicon

Milos rovcanin ti je poslao poruku...

Milos rovcanin ti je ostavio poruku...

Pošiljalac poruke i njen sadržaj biće vidljivi jedino tebi i možeš bilo kada da ih obrišeš. Takođe možeš trenutno na nju da odgovoriš uz pomoć messanger-a. Da saznaš šta piše u poruci, klikni na ovaj link:

Pročitaj poruku...


Neki ljudi iz okoline koji su na Badoo-u

Ivan
Beograd, Srbija
Saska
Beograd, Srbija
 
Sanela
Beograd, Srbija
 


Ovim i-mejlom dostavljamo poruku koju je na našem sistemu poslao korisnik Milos rovcanin. Ako ti je ovaj i-mejl stigao greškom, molimo te da ga jednostavno zanemariš. Poruka će u najkraćem roku biti uklonjena sa sistema.

Želimo ti dobar provod!
Badoo tim

Ovaj i-mejl je poslao Badoo Trading Limited (poštanska adresa ispod). Ukoliko više ne želiš da primaš Badoo obaveštenja i-mejlom, molimo te da klikneš ovde za odjavu.
Badoo Trading Limited je društvo sa ograničenom odgovornošću registrovano u Engleskoj i Velsu pod brojem 7540255 sa registrovanim sedištem na adresi: 12 Red Lion Square, London, WC1R 4QD.

_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Gmane