James Braid | 1 Nov 01:32 2007

Re: Default mount options (that suck less).

On 31 Oct 2007, at 15:41, Chris Wedgwood wrote:
> On Wed, Oct 31, 2007 at 11:05:09AM +0000, James Braid wrote:
>> We have a ~100TB filesystem that was made with the default mkfs.xfs
>> options from memory. The only mount option we use is inode64.
> Weta?  Mostly very large files?

Not weta, but another big vfx company

This particular filesystem is used for nearline backups from a bunch  
of NFS servers (which run XFS as well - we love XFS). The average  
file size is only about a megabyte.

James Braid | 1 Nov 01:47 2007

Re: Default mount options (that suck less).

On 31 Oct 2007, at 11:27, Justin Piszcz wrote:
> Impressive, what architecture do you run? ia64 or x86_64?  What  
> performance differences did you see?

It's all just commodity hardware - HP DL385 x86_64 server with a pile  
of cheap Infortrend RAID arrays. Performance wise we're limited by a  
single HBA to the disks, which is fine for this particular  
application because we saturate the network first.

xfs_repair takes a good 36 hours or so and 16-ish GB of memory to  
run. (we had to run it recently, thanks to a flakey RAID)

Lachlan McIlroy | 1 Nov 02:15 2007

Re: [PATCH] Implement fallocate

David Chinner wrote:
> XFS fallocate() callout.
> Allocate the range requested as unwritten extents. Atomically
> change the file size if requested.
> Signed-off-by: Dave Chinner <dgc <at> sgi.com>
> ---
>  fs/xfs/linux-2.6/xfs_iops.c |   45 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_iops.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_iops.c	2007-10-30 10:18:59.061735503 +1100
> +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_iops.c	2007-10-30 10:19:26.498185998 +1100
>  <at>  <at>  -52,6 +52,7  <at>  <at> 
>  #include <linux/xattr.h>
>  #include <linux/namei.h>
>  #include <linux/security.h>
> +#include <linux/falloc.h>
>  /*
>   * Bring the atime in the XFS inode uptodate.
>  <at>  <at>  -796,6 +797,49  <at>  <at>  xfs_vn_removexattr(
>  	return namesp->attr_remove(vp, attr, xflags);
>  }
> +/*
> + * generic space allocation vector.
> + */
(Continue reading)

Niv Sardi | 1 Nov 02:17 2007

Re: Default mount options (that suck less).

After discussing with dave, I changed my mind about the last patch, here's the updated version:

From ce672e92543fa99199ed23fa934dedf8d678924e Mon Sep 17 00:00:00 2001
From: Niv Sardi <xaiki <at> cxhome.ath.cx>
Date: Tue, 30 Oct 2007 12:26:35 +1100
Subject: [PATCH] less AGs for single disks configs.

get the underlying structure with get_subvol_stripe_wrapper(),
and pass sunit | swidth as an argument to calc_default_ag_geometry().

if it is set, get the AG sizes bigger.

this also cleans up a typo:
-       } else if (daflag)      /* User-specified AG size */
+       } else if (daflag)      /* User-specified AG count */
 xfsprogs/mkfs/xfs_mkfs.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c
index 78c2c77..4cf9975 100644
--- a/xfsprogs/mkfs/xfs_mkfs.c
+++ b/xfsprogs/mkfs/xfs_mkfs.c
 <at>  <at>  -393,6 +393,7  <at>  <at>  void
(Continue reading)

Lachlan McIlroy | 1 Nov 02:47 2007

Re: [PATCH] fix transaction overrun during writeback

Looks good Dave.  Since this is a writeback path is there some way
we can tell xfs_bmapi() that it should not convert anything but
delayed allocs and have it assert/error out if it tries to - not
that it will now with this change but just as defensive measure?

David Chinner wrote:
> Prevent transaction overrun in xfs_iomap_write_allocate() if we
> rce with a truncate that overlaps the delalloc range we were
> planning to allocate.
> If we race, we may allocate into a hole and that requires block
> allocation. At this point in time we don't have a reservation for
> block allocation (apart from metadata blocks) and so allocating
> into a hole rather than a delalloc region results in overflowing
> the transaction block reservation.
> Fix it by only allowing a single extent to be allocated at a
> time.
> Signed-Off-By: Dave Chinner <dgc <at> sgi.com>
> ---
>  fs/xfs/xfs_iomap.c |   75 +++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 50 insertions(+), 25 deletions(-)
> Index: 2.6.x-xfs-new/fs/xfs/xfs_iomap.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/xfs_iomap.c	2007-10-30 10:18:58.777772241 +1100
> +++ 2.6.x-xfs-new/fs/xfs/xfs_iomap.c	2007-10-30 10:19:30.365685668 +1100
>  <at>  <at>  -702,6 +702,9  <at>  <at>  retry:
>   * the originating callers request.
(Continue reading)

Eric Sandeen | 1 Nov 03:27 2007

Re: Default mount options (that suck less).

Niv Sardi wrote:

> Eric: mkfs.conf sounds like a great idea to me, but i don't think it should
> be exclusive to XFS.

It's not, ext[234] already has it :)

It'd have to be specific to each mkfs, I think, since each mkfs has its 
own knobs.


Lachlan McIlroy | 1 Nov 07:19 2007

[PATCH] Turn off XBF_READ_AHEAD in io completion

Read-ahead of an inode cluster will set XBF_READ_AHEAD in the buffer.
If we don't remove the flag it will still be set when we flush the
buffer back to disk.  Not sure if leaving this flag set causes any
serious problems but it does trigger an assert.
Attachment (readahead.diff): text/x-patch, 371 bytes
Christoph Hellwig | 1 Nov 11:00 2007

Re: [PATCH] Turn off XBF_READ_AHEAD in io completion

On Thu, Nov 01, 2007 at 05:19:35PM +1100, Lachlan McIlroy wrote:
> Read-ahead of an inode cluster will set XBF_READ_AHEAD in the buffer.
> If we don't remove the flag it will still be set when we flush the
> buffer back to disk.  Not sure if leaving this flag set causes any
> serious problems but it does trigger an assert.

It might be better if such temporary flags never actually make it to
bp->b_flags.  Just pass down a flags variable all the way to
_xfs_buf_ioapply and keep the flags just for this I/O separate from
those that are permanent and in bp->b_flags.

Emmanuel Florac | 1 Nov 19:58 2007

Re: 2.6TB Storage Size Problem

Le Tue, 30 Oct 2007 23:30:04 -0400 (EDT) vous écriviez:

> 3) You can't boot from such a device (as neither grub nor lilo
> support gpt disklabels).

lilo does support booting from gpt on Debian since Sarge at least. I'd
be surprised if the CentOS build doesn't.


Emmanuel Florac               www.intellique.com   

Jay Sullivan | 1 Nov 21:06 2007

xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c

I have an XFS filesystem that has had the following happen twice in 3
months, both times with an impossibly large block number was requested.
Unfortunately my logs don't go back far enough for me to know if it was
the _exact_ same block both times...  I'm running xfsprogs 2.8.21.
Excerpt from syslog (hostname obfuscated to 'servername' to protect the


Nov  1 14:06:32 servername dm-1: rw=0, want=39943195856896,

Nov  1 14:06:32 servername I/O error in filesystem ("dm-1") meta-data
dev dm-1 block 0x245400000ff8       ("xfs_trans_read_buf") error 5 buf
count 4096

Nov  1 14:06:32 servername xfs_force_shutdown(dm-1,0x1) called from line
415 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xc02baa25

Nov  1 14:06:32 servername Filesystem "dm-1": I/O Error Detected.
Shutting down filesystem: dm-1

Nov  1 14:06:32 servername Please umount the filesystem, and rectify the


I ran xfs_repair -L on the FS and it could be mounted again, but how
long until it happens a third time?  What concerns me is that this is a
FS smaller than 4TB and 39943195856896 (or 0x245400000ff8) seems like a
(Continue reading)