kkeller | 1 Jul 2011 01:30
Favicon

Re: xfs_growfs doesn't resize

Hello again all,

I apologize for following up my own post, but I found some new information.

On Thu 30/06/11  2:42 PM , kkeller <at> sonic.net wrote:

> http://oss.sgi.com/archives/xfs/2008-01/msg00085.html

I found a newer thread in the archives which might be more relevant to my issue:

http://oss.sgi.com/archives/xfs/2009-09/msg00206.html

But I haven't yet done a umount, and don't really wish to.  So, my followup questions are:

==Is there a simple way to figure out what xfs_growfs did, and whether it caused any problems?
==Will I be able to fix these problems, if any, without needing a umount?
==Assuming my filesystem is healthy, will a simple kernel update (and reboot of course!) allow me to resize
the filesystem in one step, instead of 2TB increments?

Again, many thanks!

--keith

--

-- 
kkeller <at> sonic.net

_______________________________________________
xfs mailing list
xfs <at> oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
(Continue reading)

Dave Chinner | 1 Jul 2011 04:22

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:
> Instead of implementing our own writeback clustering use write_cache_pages
> to do it for us.  This means the guts of the current writepage implementation
> become a new helper used both for implementing ->writepage and as a callback
> to write_cache_pages for ->writepages.  A new struct xfs_writeback_ctx
> is used to track block mapping state and the ioend chain over multiple
> invocation of it.
> 
> The advantage over the old code is that we avoid a double pagevec lookup,
> and a more efficient handling of extent boundaries inside a page for
> small blocksize filesystems, as well as having less XFS specific code.

It's not more efficient right now, due to a little bug:

>  <at>  <at>  -973,36 +821,38  <at>  <at>  xfs_vm_writepage(
>  		 * buffers covering holes here.
>  		 */
>  		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
> -			imap_valid = 0;
> +			ctx->imap_valid = 0;
>  			continue;
>  		}
>  
>  		if (buffer_unwritten(bh)) {
>  			if (type != IO_UNWRITTEN) {
>  				type = IO_UNWRITTEN;
> -				imap_valid = 0;
> +				ctx->imap_valid = 0;
>  			}
>  		} else if (buffer_delay(bh)) {
(Continue reading)

Dave Chinner | 1 Jul 2011 06:18

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

On Fri, Jul 01, 2011 at 12:22:48PM +1000, Dave Chinner wrote:
> On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:
> > Instead of implementing our own writeback clustering use write_cache_pages
> > to do it for us.  This means the guts of the current writepage implementation
> > become a new helper used both for implementing ->writepage and as a callback
> > to write_cache_pages for ->writepages.  A new struct xfs_writeback_ctx
> > is used to track block mapping state and the ioend chain over multiple
> > invocation of it.
> > 
> > The advantage over the old code is that we avoid a double pagevec lookup,
> > and a more efficient handling of extent boundaries inside a page for
> > small blocksize filesystems, as well as having less XFS specific code.
> 
> It's not more efficient right now, due to a little bug:
.....
> With the following patch, the trace output now looks like this for
> delalloc writeback:
> 
>            <...>-12623 [000] 694093.594883: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x505000 size 0xa00000
offset 0 delalloc 1 unwritten 0
>            <...>-12623 [000] 694093.594884: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x506000 size 0xa00000
offset 0 delalloc 1 unwritten 0
>            <...>-12623 [000] 694093.594884: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x507000 size 0xa00000
offset 0 delalloc 1 unwritten 0
>            <...>-12623 [000] 694093.594885: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x508000 size 0xa00000
offset 0 delalloc 1 unwritten 0
>            <...>-12623 [000] 694093.594885: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x509000 size 0xa00000
offset 0 delalloc 1 unwritten 0
>            <...>-12623 [000] 694093.594886: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x50a000 size 0xa00000
offset 0 delalloc 1 unwritten 0
(Continue reading)

Amit Sahrawat | 1 Jul 2011 06:30
Picon
Gravatar

Re: XFS and USB Hang on 2.6.35.13

On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner <david <at> fromorbit.com> wrote:
>
> On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote:
> > Hi All,
> > I encountered a hang on XFS during unplug.
> > *Test Case:*
> > #!/bin/sh
> > index=0
> > while [ "$?" == 0 ]
> > do
> >         index=$(($index+1))
> >         sync
> >         cp /mnt/1KB.txt /tmp/"$index".test
> > done
> > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
> > XFS also.
> >
> > During this operation, unplug the USB. I am getting HANG almost everytime I
> > unplug.
>
> Well, that's no surprise. The unplug appears to be losing IOs in
> progress.
>
> > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
> > why am I not using TOT kernel - I tried but my PC does not boot up with the
> > latest one)
> >
> > *Target=ARM*
> > *Logs Using Kernel Hung Task Feature*
> > # sh test.sh
(Continue reading)

Christoph Hellwig | 1 Jul 2011 10:51
Favicon

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

> 
> This piece of logic checks is the type of buffer has changed from the
> previous buffer. This used to work just fine, but now "type" is
> local to the __xfs_vm_writepage() function, while the imap life
> span?? multiple calls to the __xfs_vm_writepage() function. Hence
> type is reinitialised to IO_OVERWRITE on every page that written,
> and so for delalloc we are invalidating the imap and looking it up
> again on every page. Traces show this sort of behaviour:

Ah crap.  I actually had it that way initially, but it got lost during
a rebase due to a minimal context change screwing most hunks of the
patch.

Thanks for tracking this down!

_______________________________________________
xfs mailing list
xfs <at> oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

Dave Chinner | 1 Jul 2011 11:20

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

On Fri, Jul 01, 2011 at 04:59:58AM -0400, Christoph Hellwig wrote:
> > xfs: writepage context needs to handle discontiguous page ranges
> > 
> > From: Dave Chinner <dchinner <at> redhat.com>
> > 
> > If the pages sent down by write_cache_pages to the writepage
> > callback are discontiguous, we need to detect this and put each
> > discontiguous page range into individual ioends. This is needed to
> > ensure that the ioend accurately represents the range of the file
> > that it covers so that file size updates during IO completion set
> > the size correctly. Failure to take into account the discontiguous
> > ranges results in files being too small when writeback patterns are
> > non-sequential.
> 
> Looks good.  I still wonder why I haven't been able to hit this.
> Haven't seen any 180 failure for a long time, with both 4k and 512 byte
> filesystems and since yesterday 1k as well.

It requires the test to run the VM out of RAM and then force enough
memory pressure for kswapd to start writeback from the LRU. The
reproducer I have is a 1p, 1GB RAM VM with it's disk image on a
100MB/s HW RAID1 w/ 512MB BBWC disk subsystem.

When kswapd starts doing writeback from the LRU, the iops rate goes
through the roof (from ~300iops  <at> ~320k/io to ~7000iops  <at> 4k/io) and
throughput drops from 100MB/s to ~30MB/s. BBWC is the only reason
the IOPS stays as high as it does - maybe that is why I saw this and
you haven't.

As it is, the kswapd writeback behaviour is utterly atrocious and,
(Continue reading)

Christoph Hellwig | 1 Jul 2011 10:59
Favicon

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

> xfs: writepage context needs to handle discontiguous page ranges
> 
> From: Dave Chinner <dchinner <at> redhat.com>
> 
> If the pages sent down by write_cache_pages to the writepage
> callback are discontiguous, we need to detect this and put each
> discontiguous page range into individual ioends. This is needed to
> ensure that the ioend accurately represents the range of the file
> that it covers so that file size updates during IO completion set
> the size correctly. Failure to take into account the discontiguous
> ranges results in files being too small when writeback patterns are
> non-sequential.

Looks good.  I still wonder why I haven't been able to hit this.
Haven't seen any 180 failure for a long time, with both 4k and 512 byte
filesystems and since yesterday 1k as well.

I'll merge this, and to avoid bisect regressions it'll have to go into
the main writepages patch.  That probaby means folding the add_to_ioend
cleanup into it as well to not make the calling convention too ugly.

_______________________________________________
xfs mailing list
xfs <at> oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

Dave Chinner | 1 Jul 2011 11:03

Re: XFS and USB Hang on 2.6.35.13

On Fri, Jul 01, 2011 at 10:00:54AM +0530, Amit Sahrawat wrote:
> On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner <david <at> fromorbit.com> wrote:
> > On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote:
> > > Hi All,
> > > I encountered a hang on XFS during unplug.
> > > *Test Case:*
> > > #!/bin/sh
> > > index=0
> > > while [ "$?" == 0 ]
> > > do
> > >         index=$(($index+1))
> > >         sync
> > >         cp /mnt/1KB.txt /tmp/"$index".test
> > > done
> > > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
> > > XFS also.
> > >
> > > During this operation, unplug the USB. I am getting HANG almost everytime I
> > > unplug.
> >
> > Well, that's no surprise. The unplug appears to be losing IOs in
> > progress.
> >
> > > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
> > > why am I not using TOT kernel - I tried but my PC does not boot up with the
> > > latest one)
.....
> > > *INFO: task khubd:*33 blocked for more than 120 seconds.
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > khubd         D c06c261c     0    33      2 0x00000000
(Continue reading)

Christoph Hellwig | 1 Jul 2011 11:43
Favicon

[PATCH 00/27] patch queue for Linux 3.1, V2

This is my current patch queue for Linux 3.1.  Compared to the last
posting all review comments were incorporated and two additional trivial
patches were added.  The ->writepages implementation was dropped for now,
given the bad situation of kswap-originating writeback, but I'll repost
the fixed version separately to get feedback on the updated version.

_______________________________________________
xfs mailing list
xfs <at> oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

Christoph Hellwig | 1 Jul 2011 11:43
Favicon

[PATCH 27/27] xfs: avoid a few disk cache flushes

There is no need for a pre-flush when doing writing the second part of a
split log buffer, and if we are using an external log there is no need
to do a full cache flush of the log device at all given that all writes
to it use the FUA flag.

Signed-off-by: Christoph Hellwig <hch <at> lst.de>

Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-07-01 11:35:50.874088428 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-07-01 11:35:51.287421756 +0200
 <at>  <at>  -1371,15 +1371,21  <at>  <at>  xlog_sync(xlog_t		*log,
 	bp->b_flags |= XBF_SYNCIO;

 	if (log->l_mp->m_flags & XFS_MOUNT_BARRIER) {
+		bp->b_flags |= XBF_FUA;
+
 		/*
-		 * If we have an external log device, flush the data device
-		 * before flushing the log to make sure all meta data
-		 * written back from the AIL actually made it to disk
-		 * before writing out the new log tail LSN in the log buffer.
+		 * Flush the data device before flushing the log to make
+		 * sure all meta data written back from the AIL actually made
+		 * it to disk before stamping the new log tail LSN into the
+		 * log buffer.  For an external log we need to issue the
+		 * flush explicitly, and unfortunately synchronously here;
+		 * for an internal log we can simply use the block layer
+		 * state machine for preflushes.
 		 */
(Continue reading)


Gmane