Joseph Cheng | 1 Dec 2006 03:55
Picon

maintain 6TB filesystem + fsck

i posted on rhel list about proper creating of 6tb ext3 filesystem and
tuning here.......http://www.redhat.com/archives/nahant-list/2006-November/msg00239.html
i am reading lots of ext3 links like......
http://www.redhat.com/support/wpapers/redhat/ext3/
http://lists.centos.org/pipermail/centos/2005-September/052533.html
http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html
............but not enough links for large TB arrays and ext3 :( there
is lots of old faq and information so pls show me errors of my ways
lol. after reading mailing list posts i have created filesystems like
this........
mkfs.ext3 -b 4096 -i 65536 -j -m 1 -O dir_index -L /prodspace1 /dev/sda1

i put output of mkfs.ext3 and tune2fs -l below. is there any thing
that i am mistaken about?? my other problem is fsck. i read
here.....http://listman.redhat.com/archives/ext3-users/2006-October/msg00005.html
'The major problem at this point is e2fsck time, which is about 1h/TB
for fast disks, at minimum (i.e. no major corruption found).'
.........is that ext3 or ext4? i don't know how long fsck will take w/
6TB ext3 filesystem. i first choose to disable auto fsck with 'tune2fs
-i0 -c0 /dev/sda1' but seems dangerous if filesystem become corrupt
without my knowledge! what is good balance betwen using auto fsck
after number of mounts or time pass and keeping fsck time short for
large arrays? info......
os is rhel es 4 update 4 w/ generic server hardware
storage hardware is multiple apple xserve raid w/ 6TB array each
filesystem expected to contain 10 mb files to maybe even 50 mb + 100mb

# tune2fs -l /dev/sda1
tune2fs 1.35 (28-Feb-2004)
Filesystem volume name:   /prodspace1
(Continue reading)

Andreas Dilger | 1 Dec 2006 13:44

Re: maintain 6TB filesystem + fsck

On Nov 30, 2006  21:55 -0500, Joseph Cheng wrote:
> http://listman.redhat.com/archives/ext3-users/2006-October/msg00005.html
> 'The major problem at this point is e2fsck time, which is about 1h/TB
> for fast disks, at minimum (i.e. no major corruption found).'
> .........is that ext3 or ext4?

I don't think it really matters.

> i don't know how long fsck will take w/ 6TB ext3 filesystem.

You have such a filesystem, test it...

> i first choose to disable auto fsck with 'tune2fs
> -i0 -c0 /dev/sda1' but seems dangerous if filesystem become corrupt
> without my knowledge! what is good balance betwen using auto fsck
> after number of mounts or time pass and keeping fsck time short for
> large arrays? info......

You can optionally run e2fsck -fn on a relatively quiet (though mounted)
filesystem, and if it checks (relatively) clean then you could reset the
fsck time in the superblock via tune2fs.

> filesystem expected to contain 10 mb files to maybe even 50 mb + 100mb

One of the major slowdowns for e2fsck is the number of inodes, so if you
expect to have very large files you should create the filesystem with
this in mind (i.e. "mke2fs -t largefile" or "mke2fs -t largefile4").
Expect e2fsck RAM usage to be about .75 * num_inodes + .25 * num_blocks,
so in the neighbourhood of 500MB for your filesystem, so reducing inode
count would also help this a fair amount.
(Continue reading)

dushy | 6 Dec 2006 16:38
Picon

File size differences

Hey,

I have two identical machines setup with a RAID 5 array. One of them is used for
failovers and data from the master is synced everyday using rsync to the
failover machine. The data on this disks are usually intranet KB's, DB's etc..

The RAID 5 arrays are formatted using the default options i,e mkfs.ext3
/dev/Xda. The RAID controller is 3ware escalade and each disk member in the RAID
5 array are 400Gb IDE.

Now the wierd part is, after syncing the failover with the master and comparing
the size of each dir and file I find some files where the size mismatches..

[root <at> storage-master repositories]# du --si
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
8.2k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt

root <at> storage-slave compare]# du --si
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
4.1k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt

stat on the same file shows..

[root <at> storage-master repositories]# stat
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
(Continue reading)

Justin Piszcz | 6 Dec 2006 20:22

Re: File size differences

Have you MD5SUM'd the file on both sides?  If it is the same, then you 
have no problems.

% md5sum filename

On each side, compare output.

Justin.

On Wed, 6 Dec 2006, dushy wrote:

> Hey,
> 
> I have two identical machines setup with a RAID 5 array. One of them is used for
> failovers and data from the master is synced everyday using rsync to the
> failover machine. The data on this disks are usually intranet KB's, DB's etc..
> 
> The RAID 5 arrays are formatted using the default options i,e mkfs.ext3
> /dev/Xda. The RAID controller is 3ware escalade and each disk member in the RAID
> 5 array are 400Gb IDE.
> 
> Now the wierd part is, after syncing the failover with the master and comparing
> the size of each dir and file I find some files where the size mismatches..
> 
> [root <at> storage-master repositories]# du --si
> "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
> 8.2k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt
> 
(Continue reading)

Andreas Dilger | 6 Dec 2006 20:54

Re: File size differences

On Dec 06, 2006  15:38 +0000, dushy wrote:
> [root <at> storage-master repositories]# stat
> "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
>   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
>   Size: 1126            Blocks: 16         IO Block: 4096   regular file
> Device: 801h/2049d      Inode: 10403842    Links: 1
> Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> Access: 2006-09-11 12:22:24.000000000 +0530
> Modify: 2004-09-23 16:45:31.000000000 +0530
> Change: 2006-02-23 18:31:42.000000000 +0530
> 
> root <at> storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd
> Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
>   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
>   Size: 1126            Blocks: 8          IO Block: 4096   regular file
> Device: 801h/2049d      Inode: 23019536    Links: 1
> Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> Access: 2001-01-28 21:10:14.000000000 +0530
> Modify: 2004-09-23 16:45:31.000000000 +0530
> Change: 2001-01-28 21:10:14.000000000 +0530

I'd suspect you have SELinux enabled on one of the nodes and not the
other?  Could also be ACLs. It is likely adding a 4kB EA block to each file.

Cheers, Andreas
--
Andreas Dilger
(Continue reading)

Matija Nalis | 7 Dec 2006 21:18
Picon

Re: File size differences

On Wed, Dec 06, 2006 at 03:38:08PM +0000, dushy wrote:
> Now the wierd part is, after syncing the failover with the master and comparing
> the size of each dir and file I find some files where the size mismatches..
>   Size: 1126            Blocks: 16         IO Block: 4096   regular file
>   Size: 1126            Blocks: 8          IO Block: 4096   regular file

maybe those files contain enough zero-bytes, and rsync has made a 
sparse file ?

--

-- 
Opinions above are GNU-copylefted.
Bruno Wolff III | 9 Dec 2006 23:04
Picon

fsync, ext3, raid (md) 1, write barriers and PATA caching

I have been trying to figure out whether I can enable write caching on my
PATA hard drives (WD3200JB) and have fsync not return until data is
safely on the platters. I am also running software raid.
This is currently on FC5 (though soon to be FC6) with a 2.6.18 kernel.

>From snippets I have found on the net, it looks like write barriers are
pushed down through software raid when using raid 1. So that if I mount
the file systems with data=ordered and barrier=1, I think I should be
OK, but I was hoping to get a more definitive answer.

It also looks like barrier=1 is or will be the default for ext3. Is there
a way I can check if this is the case on my system?

/proc/mounts doesn't show the barrier option when I use barrier=1 or don't
specify it at all. mount -lv shows the barrier option (when it was used
for mounting), but not the data option. I am not sure if either of these
are using the same data that the ext3 driver is using.
Christian Kujau | 9 Dec 2006 23:51
Picon

Re: fsync, ext3, raid (md) 1, write barriers and PATA caching

On Sat, 9 Dec 2006, Bruno Wolff III wrote:
> It also looks like barrier=1 is or will be the default for ext3. Is there
> a way I can check if this is the case on my system?

Hm, indeed: if write barriers are not available, mounting an XFS 
filesystem shows:

> Filesystem "md0": Disabling barriers, not supported by the underlying device

Mounting the same device when formatted with ext3 does not show this 
message nor does /proc/mounts reveal anything....could this be tweaked 
somehow?

Christian.
--

-- 
BOFH excuse #288:

Hard drive sleeping. Let it wake up on it's own...
Ric Wheeler | 11 Dec 2006 17:14

Re: fsync, ext3, raid (md) 1, write barriers and PATA caching


Bruno Wolff III wrote:
> I have been trying to figure out whether I can enable write caching on my
> PATA hard drives (WD3200JB) and have fsync not return until data is
> safely on the platters. I am also running software raid.
> This is currently on FC5 (though soon to be FC6) with a 2.6.18 kernel.
> 
>>From snippets I have found on the net, it looks like write barriers are
> pushed down through software raid when using raid 1. So that if I mount
> the file systems with data=ordered and barrier=1, I think I should be
> OK, but I was hoping to get a more definitive answer.
> 
> It also looks like barrier=1 is or will be the default for ext3. Is there
> a way I can check if this is the case on my system?
> 
> /proc/mounts doesn't show the barrier option when I use barrier=1 or don't
> specify it at all. mount -lv shows the barrier option (when it was used
> for mounting), but not the data option. I am not sure if either of these
> are using the same data that the ext3 driver is using.

You can always do a sanity test on the barrier by timing how many synchronous 
files/sec you can create (i.e., create/write/fsync/close).  Speeds vary 
depending on what kind of drive you have, journal mode, etc, but you will always 
see much faster times with the barrier off than on while writing small files 
(say 10K).

regards,

ric
(Continue reading)

Bruno Wolff III | 11 Dec 2006 17:36
Picon

Re: fsync, ext3, raid (md) 1, write barriers and PATA caching

On Mon, Dec 11, 2006 at 11:14:52 -0500,
  Ric Wheeler <ric <at> emc.com> wrote:
> 
> You can always do a sanity test on the barrier by timing how many 
> synchronous files/sec you can create (i.e., create/write/fsync/close).  
> Speeds vary depending on what kind of drive you have, journal mode, etc, 
> but you will always see much faster times with the barrier off than on 
> while writing small files (say 10K).

That's probably a good idea in any case. Down the road I will be interested
in whether barriers work through encrypted file systems and this will be a good
test to have available.

I should get at most 120 commits per second if write barriers are working;
so I think that should be easy to detect.

Is there already a tool out there that does this? It shouldn't be hard to
write something simple, but maybe someone has written something fancy
already.

Gmane