Tim Shimmin | 1 Jul 08:01 2008
Picon

TAKE 983924 - take back out again QUEUE_ORDERED_NONE test in check_barriers

Disable queue flag test in barrier check.

md raid1 can pass down barriers, but does not set an ordered flag
on the queue, so xfs does not even attempt a barrier write, and
will never use barriers on these block devices.

Remove the flag check and just let the barrier write
test determine barrier support.

A possible risk here is that if something does not set an ordered
flag and also does not properly return an error on a barrier write...
but if it's any consolation jbd/ext3/reiserfs never test the flag,
and don't even do a test write, they just disable barriers the first
time an actual journal barrier write fails.

Signed-off-by: Eric Sandeen <sandeen <at> sandeen.net>

Date:  Tue Jul  1 15:59:48 AEST 2008
Workarea:  chook.melbourne.sgi.com:/build/tes/2.6.x-xfs-quilt
Inspected by:  sandeen <at> sandeen.net

The following file(s) were checked into:
  longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb

Modid:  xfs-linux-melb:xfs-kern:31377a
fs/xfs/linux-2.6/xfs_super.c - 1.433 - changed
http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_super.c.diff?r1=text&tr1=1.433&r2=text&tr2=1.432&f=h
	- disable queue flag test in barrier check

(Continue reading)

Lachlan McIlroy | 1 Jul 08:21 2008
Picon

Re: [PATCH] Fix use after free when closing log/rt devices

Christoph Hellwig wrote:
> On Fri, Jun 27, 2008 at 03:14:46PM +1000, Lachlan McIlroy wrote:
>> The call to xfs_free_buftarg() will free the memory used by it's argument
>> so we need to save the bdev to pass to xfs_blkdev_put()
>>
>> Lachlan
>>
>> --- fs/xfs/linux-2.6/xfs_super.c_1.432	2008-06-27 14:51:17.000000000 +1000
>> +++ fs/xfs/linux-2.6/xfs_super.c	2008-06-27 14:59:26.000000000 +1000
>>  <at>  <at>  -781,13 +781,17  <at>  <at>  STATIC void
>> xfs_close_devices(
>> 	struct xfs_mount	*mp)
>> {
>> +	struct block_device	*bdev;
>> +
>> 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
>> +		bdev = mp->m_logdev_targp->bt_bdev;
>> 		xfs_free_buftarg(mp->m_logdev_targp);
>> -		xfs_blkdev_put(mp->m_logdev_targp->bt_bdev);
>> +		xfs_blkdev_put(bdev);
>> 	}
>> 	if (mp->m_rtdev_targp) {
>> +		bdev = mp->m_rtdev_targp->bt_bdev;
>> 		xfs_free_buftarg(mp->m_rtdev_targp);
>> -		xfs_blkdev_put(mp->m_rtdev_targp->bt_bdev);
>> +		xfs_blkdev_put(bdev);
>> 	}
> 
> Looks good, alhough two local variables inside the ifs might be cleaner:
> 
(Continue reading)

Dave Chinner | 1 Jul 08:44 2008

Re: Xfs Access to block zero exception and system crash

On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote:
> After running my test for 20 min, when I check the fragmentation status  
> of file system, I observe that it
> is severely fragmented.

Depends on your definition of fragmentation....

> [root <at> NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol
> actual 94343, ideal 107, fragmentation factor 99.89%

And that one is a bad one ;)

Still, there are a lot of extents - ~1000 to a file - which
will be stressing the btree extent format code.

> Do you think, this can cause the issue?

Sure - just like any other workload that generates enough
extents. Like I said originally, we've fixed so many problems
in this code since 2.6.18 I'd suggest that your only sane
hope for us to help you track done the problem is to upgrade
to a current kernel and go from there....

Cheers,,

Dave.
--

-- 
Dave Chinner
david <at> fromorbit.com

(Continue reading)

Barry Naujok | 1 Jul 10:00 2008
Picon

REVIEW: xfs_repair fixes for bad directories

Two issues have been encounted with xfs_repair and badly corrupted
directories.

1. A huge size (inode di_size) can cause malloc which will fail.
    Patch dir_size_check.patch checks for a valid directory size
    and if it's bad, junks the directory. The di_size for a dir
    only counts the data blocks being used, not all the other
    associated metadata. This is limited to 32GB by the
    XFS_DIR2_LEAF_OFFSET value in XFS. Anything greater than this
    must be invalid.

2. An update a while ago to xfs_repair attempts to fix invalid
    ".." entries for subdirectories where there is a valid parent
    with the appropriate entry. It was a partial fix that never
    did the full job, especially if the subdirectory was short-
    form or it has already been processed.

    Patch fix_dir_rebuild_without_dotdot_entry.patch creates a
    post-processing queue after the main scan to update any
    directories with an invalid ".." entry.

Both these patches sit on top of the dinode.patch that has been
posted out for review previously.

Attachment (dinode.patch): text/x-patch, 61 KiB
Attachment (dir_size_check.patch): text/x-patch, 458 bytes
Christoph Litauer | 1 Jul 10:07 2008
Picon

Re: rfc: kill ino64 mount option

Mark Goodwin schrieb:
> 
> 
> Dave Chinner wrote:
>> On Fri, Jun 27, 2008 at 05:39:28PM +0200, Christoph Hellwig wrote:
>>> Does anyone have objections to kill the ino64 mount option?  It's purely
>>> a debug tool to force inode numbers outside of the range representable
>>> in 32bits and is quite invasive for something that could easily be
>>> debugged by just having a large enough filesystem..
>>
>> It's the "large enough fs" that is the problem. XFSQA uses
>> small partitions for the most part, and this allows testing
>> of 64 bit inode numbers with a standard qa config.
>>
>> That being said, I don't really if it goes or stays...
> 
> Although ino64 has interoperability issues with 32bit apps, it does
> have significant performance advantages over inode32 for some
> storage topologies and workloads, i.e. it's generally desirable to
> keep inodes near their data, but with large configs inode32 can't
> always oblige. ino64 is not just a debug tool.
> 
> We have a design proposal known as "inode32+" that essentially removes
> the direct mapping between inode number and disk offset. This will
> provide all the layout and performance benefits of ino64 without the
> interop issues.  Until inode32+ is available, we need to keep ino64.

Hi,

as I have massive performance problems using xfs with millions of 
(Continue reading)

Christoph Hellwig | 1 Jul 10:08 2008

Re: [PATCH 1/3] Implement generic freeze feature

>  {
>  	struct super_block *sb;
>  
> +	if (test_and_set_bit(BD_FREEZE_OP, &bdev->bd_state))
> +		return ERR_PTR(-EBUSY);
> +
> +	sb = get_super(bdev);
> +
> +	/* If super_block has been already frozen, return. */
> +	if (sb && sb->s_frozen != SB_UNFROZEN) {
> +		drop_super(sb);
> +		clear_bit(BD_FREEZE_OP, &bdev->bd_state);
> +		return sb;
> +	}
> +
> +	if (sb)
> +		drop_super(sb);
> +
>  	down(&bdev->bd_mount_sem);
>  	sb = get_super(bdev);
>  	if (sb && !(sb->s_flags & MS_RDONLY)) {
>  <at>  <at>  -219,6 +234,8  <at>  <at>  struct super_block *freeze_bdev(struct b
>  	}
>  
>  	sync_blockdev(bdev);
> +	clear_bit(BD_FREEZE_OP, &bdev->bd_state);
> +

Please only clear BD_FREEZE_OP in thaw_bdev, that way you can also get
rid of the frozen check above, and the double-get_super.  Also
(Continue reading)

Christoph Hellwig | 1 Jul 10:10 2008

Re: [PATCH 3/3] Add timeout feature

I still disagree with this whole patch.  There is not reason to let
the freeze request timeout - an auto-unfreezing will only confuse the
hell out of the caller.  The only reason where the current XFS freeze
call can hang and this would be theoretically useful is when the
filesystem is already frozen by someone else, but this should be fixed
by refusing to do the second freeze, as suggested in my comment to patch
1.

Christoph Hellwig | 1 Jul 10:13 2008

Re: REVIEW: xfs_repair fixes for bad directories

On Tue, Jul 01, 2008 at 06:00:17PM +1000, Barry Naujok wrote:
> Two issues have been encounted with xfs_repair and badly corrupted
> directories.
>
> 1. A huge size (inode di_size) can cause malloc which will fail.
>    Patch dir_size_check.patch checks for a valid directory size
>    and if it's bad, junks the directory. The di_size for a dir
>    only counts the data blocks being used, not all the other
>    associated metadata. This is limited to 32GB by the
>    XFS_DIR2_LEAF_OFFSET value in XFS. Anything greater than this
>    must be invalid.

This one looks good.

> 2. An update a while ago to xfs_repair attempts to fix invalid
>    ".." entries for subdirectories where there is a valid parent
>    with the appropriate entry. It was a partial fix that never
>    did the full job, especially if the subdirectory was short-
>    form or it has already been processed.
>
>    Patch fix_dir_rebuild_without_dotdot_entry.patch creates a
>    post-processing queue after the main scan to update any
>    directories with an invalid ".." entry.

For this one I'll need to read the surrounding code first to do
a useful review, so it'll take some time.

Takashi Sato | 1 Jul 11:12 2008
Picon

Re: [dm-devel] [PATCH 0/3] freeze feature ver 1.8

Hi,

Alasdair G Kergon wrote:
>> Currently, ext3 in mainline Linux doesn't have the freeze feature which
>> suspends write requests.  So, we cannot take a backup which keeps
>> the filesystem's consistency with the storage device's features
>> (snapshot and replication) while it is mounted.
>> In many case, a commercial filesystem (e.g. VxFS) has
>> the freeze feature and it would be used to get the consistent backup.
>> If Linux's standard filesytem ext3 has the freeze feature, we can do it
>> without a commercial filesystem.
>
> Is the following a fair summary?

Yes, you are right.
We'd like to use the freeze feature without device-mapper/LVM.

> 1. Some filesystems have a freeze/thaw feature.  XFS exports this to
> userspace directly through a couple of ioctls, but other filesystems
> don't.  For filesystems on device-mapper block devices it is exported to
> userspace through the DM_DEV_SUSPEND ioctl which LVM uses.
>
> 2. There is a desire to access this feature from userspace on non-XFS
> filesystems without having to use device-mapper/LVM.
>
> Alasdair

Cheers, Takashi 

(Continue reading)

Alasdair G Kergon | 1 Jul 12:52 2008
Picon

Re: [dm-devel] Re: [PATCH 3/3] Add timeout feature

On Tue, Jul 01, 2008 at 04:10:26AM -0400, Christoph Hellwig wrote:
> I still disagree with this whole patch.  

Same here - if you want a timeout, what stops you from implementing it in a
userspace process?  If your concern is that the process might die without
thawing the filesystem, take a look at the userspace LVM/multipath code for
ideas - lock into memory, disable OOM killer, run from ramdisk etc.
In practice, those techniques seem to be good enough.

> call can hang and this would be theoretically useful is when the
> filesystem is already frozen by someone else, but this should be fixed
> by refusing to do the second freeze, as suggested in my comment to patch
> 1.

Similarly if a device-mapper device is involved, how should the following
sequence behave - A, B or C?

1. dmsetup suspend (freezes)
2. FIFREEZE
3. FITHAW
4. dmsetup resume (thaws)

A:
  1 succeeds, freezes
  2 succeeds, remains frozen
  3 succeeds, remains frozen
  4 succeeds, thaws

B:
  1 succeeds, freezes
(Continue reading)


Gmane