Dave Chinner | 1 Mar 04:16 2010

Re: Proposed patch for xfsprogs

On Sat, Feb 27, 2010 at 05:54:10PM -0500, C. Linus Hicks wrote:
> During my recent experience with having to reconstruct parts of an XFS
> filesystem that got corrupted as a result of several bad blocks, I found
> that some of the information displayed using "blockget -v" was pretty
> useless, and I am proposing the following code change to introduce a
> slight summarization.
> 
> Repeating lines of "setting block <foo> to <bar>" and "setting inode to
> <foo> for {,rt}block <bar>" will be summarized down to two lines.

Agreed, that would certainly help reduce the verbosity of the output.
However, I don't think the patch is correct.

> --- a/xfsprogs-3.1.1/db/check.c	2010-01-29 14:46:13.000000000 -0500
> +++ b/xfsprogs-3.1.1/db/check.c	2010-02-27 17:02:14.111418960 -0500
>  <at>  <at>  -1509,6 +1509,7  <at>  <at> 
>  {
>  	xfs_extlen_t	i;
>  	int		mayprint;
> +	int		isfirst = 1;
>  	char		*p;
>  
>  	if (!check_range(agno, agbno, len))  {
>  <at>  <at>  -1520,10 +1521,15  <at>  <at> 
>  	mayprint = verbose | blist_size;
>  	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
>  		*p = (char)type2;
> -		if (mayprint && (verbose || CHECK_BLISTA(agno, agbno + i)))
> +		if (isfirst && mayprint && (verbose || CHECK_BLISTA(agno, agbno + i))) {
>  			dbprintf(_("setting block %u/%u to %s\n"), agno, agbno + i,
(Continue reading)

Christoph Hellwig | 1 Mar 10:51 2010

Re: [PATCH] Inode reclaim fixes (was Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaim)

On Fri, Feb 26, 2010 at 10:45:53AM +1100, Dave Chinner wrote:
> Good to hear. The fixes are already in 2.6.33 (just released), so
> the question is whether we backport to 2.6.32 or not. Christoph,
> Alex, Eric - should we push these fixes back to .32-stable?

My latests patch to fix the locking for tag manipulations isn't in
any tree yet.   We should get it into mainline and 2.6.33-stable,
and if the previous patches are backported .32-stable as well.

_______________________________________________
xfs mailing list
xfs <at> oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

Christoph Hellwig | 1 Mar 12:30 2010

xfs: fix locking for inode cache radix tree tag updates

The radix-tree code requires it's users to serialize tag updates against
other updates to the tree.  While XFS protects tag updates against each
other it does not serialize them against updates of the tree contents,
which can lead to tag corruption.  Fix the inode cache to always take
pag_ici_lock in exclusive mode when updating radix tree tags.

Signed-off-by: Christoph Hellwig <hch <at> lst.de>
Reported-by: Patrick Schreurs <patrick <at> news-service.com>
Tested-by: Patrick Schreurs <patrick <at> news-service.com>

Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2010-02-10 13:08:41.000000000 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2010-02-10 15:53:28.739570272 +0100
 <at>  <at>  -687,12 +687,12  <at>  <at>  xfs_inode_set_reclaim_tag(
 	struct xfs_perag *pag;

 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	read_lock(&pag->pag_ici_lock);
+	write_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
 	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
-	read_unlock(&pag->pag_ici_lock);
+	write_unlock(&pag->pag_ici_lock);
 	xfs_perag_put(pag);
 }

Index: xfs/fs/xfs/xfs_iget.c
(Continue reading)

C Linus Hicks | 1 Mar 15:37 2010
Picon
Picon

Re: Proposed patch for xfsprogs

On Mon, 2010-03-01 at 14:16 +1100, Dave Chinner wrote:
> On Sat, Feb 27, 2010 at 05:54:10PM -0500, C. Linus Hicks wrote:
> > During my recent experience with having to reconstruct parts of an XFS
> > filesystem that got corrupted as a result of several bad blocks, I found
> > that some of the information displayed using "blockget -v" was pretty
> > useless, and I am proposing the following code change to introduce a
> > slight summarization.
> > 
> > Repeating lines of "setting block <foo> to <bar>" and "setting inode to
> > <foo> for {,rt}block <bar>" will be summarized down to two lines.
> 
> Agreed, that would certainly help reduce the verbosity of the output.
> However, I don't think the patch is correct.
> 
> > --- a/xfsprogs-3.1.1/db/check.c	2010-01-29 14:46:13.000000000 -0500
> > +++ b/xfsprogs-3.1.1/db/check.c	2010-02-27 17:02:14.111418960 -0500
> >  <at>  <at>  -1509,6 +1509,7  <at>  <at> 
> >  {
> >  	xfs_extlen_t	i;
> >  	int		mayprint;
> > +	int		isfirst = 1;
> >  	char		*p;
> >  
> >  	if (!check_range(agno, agbno, len))  {
> >  <at>  <at>  -1520,10 +1521,15  <at>  <at> 
> >  	mayprint = verbose | blist_size;
> >  	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
> >  		*p = (char)type2;
> > -		if (mayprint && (verbose || CHECK_BLISTA(agno, agbno + i)))
> > +		if (isfirst && mayprint && (verbose || CHECK_BLISTA(agno, agbno + i))) {
(Continue reading)

Alex Elder | 2 Mar 02:06 2010
Picon

Re: xfs: fix locking for inode cache radix tree tag updates

In-reply-to: <20100301113031.GA13217 <at> infradead.org>

I've temporarily lost my ability to *read* my e-mail, but I
am able to blindly send this off...

> From: Christoph Hellwig <hch <at> infradead.org>
> 
> The radix-tree code requires it's users to serialize tag updates against
> other updates to the tree.  While XFS protects tag updates against each
> other it does not serialize them against updates of the tree contents,
> which can lead to tag corruption.  Fix the inode cache to always take
> pag_ici_lock in exclusive mode when updating radix tree tags.

I have one minor comment below, and point out a typo, but otherwise
this looks good.  I'll fix the typo myself.

As I understand it, the point here is that the tags associated
with a value stored in a radix tree are just as much "content"
as the value itself.  Therefore you need to use the same locking
protocol.  So we need to get the write lock to modify a tag value.

My comments require no action on your part, so...

Reviewed-by: Alex Elder <aelder <at> sgi.com>

> Signed-off-by: Christoph Hellwig <hch <at> lst.de>
> Reported-by: Patrick Schreurs <patrick <at> news-service.com>
> Tested-by: Patrick Schreurs <patrick <at> news-service.com>

. . .
(Continue reading)

Christoph Hellwig | 2 Mar 10:31 2010
Picon

Re: xfs: fix locking for inode cache radix tree tag updates

On Mon, Mar 01, 2010 at 07:06:42PM -0600, Alex Elder wrote:
> As I understand it, the point here is that the tags associated
> with a value stored in a radix tree are just as much "content"
> as the value itself.  Therefore you need to use the same locking
> protocol.  So we need to get the write lock to modify a tag value.

Yes.

> This succeeds at getting xfs_reclaim_inode() to skip over
> this one.  I think it's overloading the flag though--at least
> the name of the flag (which maybe is ambiguous anyway) doesn't
> really reflect what's happening here.  (The result is correct,
> however.)

We're kind of reclaiming the inode - just instead of freeing it
we're immediately reusing it.  We used the XFS_IRECLAIM flag
the same way before we started redoing the iget code.

_______________________________________________
xfs mailing list
xfs <at> oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

Jason Vagalatos | 2 Mar 18:22 2010

Stalled xfs_repair on 100TB filesystem

Hello,
On Friday 2/26 I started an xfs_repair on a 100TB filesystem:

#> nohup xfs_repair -v -l /dev/logfs-sessions/logdev /dev/logfs-sessions/sessions >
/root/xfs_repair.out.logfs1.sjc.02262010 &

I've been monitoring the process with 'top' and tailing the output file from the redirect above.  I
believe the repair has "stalled".  When the process was running 'top' showed almost all physical memory
consumed and 12.6G of virt memory consumed by xfs_repair.  It made it all the way to Phase 6 and has been
sitting at agno = 14 for almost 48 hours.  The memory consumption of xfs_repair has ceased but the process
is still "running" and consuming 100% CPU:

top - 10:10:37 up 3 days, 21:06,  1 user,  load average: 1.20, 1.13, 1.09
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.5%us,  0.0%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8177380k total,   896668k used,  7280712k free,   247100k buffers
Swap: 56525356k total,   173852k used, 56351504k free,   304588k cached

  PID    USER   PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+ 
COMMAND                                                                
32705  root     25   0  160m  95m  704 R    100        1.2      2629:53    xfs_repair

#> tail -f -n1000 xfs_repair.out.logfs1.sjc.02262010
........
        - agno = 98
        - agno = 99
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
(Continue reading)

C Linus Hicks | 2 Mar 21:10 2010
Picon
Picon

Re: Proposed patch for xfsprogs

This patch is for reducing the verbosity of the xfs_db "blockget"
command especially when using "-v" by listing only start and end block
numbers for a range, rather than listing all individual blocks. For a
sample 50gb filesystem with around 50% free space and about 20 corrupted
sectors, the logfile size drops from 900mb to 3mb.

Here's my fixed patch:

--- a/xfsprogs-3.1.1/db/check.c	2010-01-29 14:46:13.000000000 -0500
+++ b/xfsprogs-3.1.1/db/check.c	2010-03-01 17:47:01.000000000 -0500
 <at>  <at>  -1509,6 +1509,8  <at>  <at> 
 {
 	xfs_extlen_t	i;
 	int		mayprint;
+	int		pcnt = 0;
+	int		first_prnt = -1;
 	char		*p;

 	if (!check_range(agno, agbno, len))  {
 <at>  <at>  -1520,10 +1522,25  <at>  <at> 
 	mayprint = verbose | blist_size;
 	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
 		*p = (char)type2;
-		if (mayprint && (verbose || CHECK_BLISTA(agno, agbno + i)))
-			dbprintf(_("setting block %u/%u to %s\n"), agno, agbno + i,
-				typename[type2]);
+		if (mayprint && (verbose || CHECK_BLISTA(agno, agbno + i))) {
+			if (first_prnt == -1) {
+				dbprintf(_("setting block %u/%u to %s\n"),
+					 agno, agbno + i, typename[type2]);
(Continue reading)

Stan Hoeppner | 3 Mar 01:35 2010

Re: Stalled xfs_repair on 100TB filesystem

Jason Vagalatos put forth on 3/2/2010 11:22 AM:
> Hello,
> On Friday 2/26 I started an xfs_repair on a 100TB filesystem:
> 
> #> nohup xfs_repair -v -l /dev/logfs-sessions/logdev /dev/logfs-sessions/sessions >
/root/xfs_repair.out.logfs1.sjc.02262010 &
> 
> I've been monitoring the process with 'top' and tailing the output file from the redirect above.  I believe
the repair has "stalled".  When the process was running 'top' showed almost all physical memory consumed
and 12.6G of virt memory consumed by xfs_repair.  It made it all the way to Phase 6 and has been sitting at agno
= 14 for almost 48 hours.  The memory consumption of xfs_repair has ceased but the process is still
"running" and consuming 100% CPU:

Here's how another user solved this xfs_repair "hanging" problem.  I say
"hang" because "stall" didn't return the right Google results.

http://marc.info/?l=linux-xfs&m=120600321509730&w=2

Excerpt:

"In betwenn I created a test filesystem 360GB with 120million inodes on it.
xfs_repair without options is unable to complete. If I run xfs_repair -o
bhash=8192 the repair process terminates normally (the filesystem is
actually ok)."

Unfortunately it appears you'll have to start the repair over again.

--

-- 
Stan

(Continue reading)

Dave Chinner | 3 Mar 01:25 2010

Re: Stalled xfs_repair on 100TB filesystem

On Tue, Mar 02, 2010 at 09:22:34AM -0800, Jason Vagalatos wrote:
> Hello, On Friday 2/26 I started an xfs_repair on a 100TB
> filesystem:
> 
> #> nohup xfs_repair -v -l /dev/logfs-sessions/logdev
> /dev/logfs-sessions/sessions >
> /root/xfs_repair.out.logfs1.sjc.02262010 &
> 
> I've been monitoring the process with 'top' and tailing the output
> file from the redirect above.  I believe the repair has
> "stalled".  When the process was running 'top' showed almost all
> physical memory consumed and 12.6G of virt memory consumed by
> xfs_repair.  It made it all the way to Phase 6 and has been
> sitting at agno = 14 for almost 48 hours.  The memory consumption
> of xfs_repair has ceased but the process is still "running" and
> consuming 100% CPU:

I wish we could reproduce hangs like this easily. I'd kill the
repair and run with the -P option. From the xfs_repair man page:

       -P     Disable prefetching of inode and directory blocks. Use
	      this option if you find xfs_repair gets stuck and
	      proceeding. Interrupting a stuck xfs_repair is safe.

Cheers,

Dave.
--

-- 
Dave Chinner
david <at> fromorbit.com
(Continue reading)


Gmane