Don Lewis | 1 Feb 02:00 2012
Picon

CFR patch to improve fsdb sparse file handling

A while back I noticed that fsdb bailed out early if it was asked to
print the block list for a file with a hole in its direct block list. At
the time I just commented out the early return.  Recently I had a chance
to revisit this code and came up with what I think is a better patch.
I thought it would be more informative to print out the NULL block
pointers to make it clear that the file is sparse even though the
existing code that handles the indirect blocks skips over the holes. The
code that printed the fragment count looked too much like an entry in an
obfuscated C contest, so I simplified it a bit.  I tried to match the
existing style instead of changing it to match style(9).

Index: sbin/fsdb/fsdbutil.c
===================================================================
--- sbin/fsdb/fsdbutil.c	(revision 230604)
+++ sbin/fsdb/fsdbutil.c	(working copy)
 <at>  <at>  -293,22 +293,21  <at>  <at> 
     printf("Blocks for inode %d:\n", inum);
     printf("Direct blocks:\n");
     ndb = howmany(DIP(dp, di_size), sblock.fs_bsize);
-    for (i = 0; i < NDADDR; i++) {
-	if (DIP(dp, di_db[i]) == 0) {
-	    putchar('\n');
-	    return;
-	}
+    for (i = 0; i < NDADDR && i < ndb; i++) {
 	if (i > 0)
 	    printf(", ");
 	blkno = DIP(dp, di_db[i]);
 	printf("%jd", (intmax_t)blkno);
-	if (--ndb == 0 && (offset = blkoff(&sblock, DIP(dp, di_size))) != 0) {
(Continue reading)

Kirk McKusick | 1 Feb 07:14 2012

Re: CFR patch to improve fsdb sparse file handling

Your change looks reasonable to me.

A more elaborate (e.g., compact listing) scheme that I wrote
for printing out block numbers is given below. Not sure if it
is worth adapting to use in fsdb.

	Kirk McKusick

=-=-=

/*
 * Copyright (c) 1998 Marshall Kirk McKusick. All Rights Reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY MARSHALL KIRK MCKUSICK ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL MARSHALL KIRK MCKUSICK BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
(Continue reading)

Kirk McKusick | 1 Feb 07:26 2012

Re: kern/164472: [ufs] fsck -B panics on particular data inconsistency

> From: Kostik Belousov <kostikbel <at> gmail.com>
> To: bug-followup <at> FreeBSD.org, eugene <at> zhegan.in
> Cc:  
> Subject: Re: kern/164472: [ufs] fsck -B panics on particular data inconsistency
> Date: Mon, 30 Jan 2012 07:30:04 +0200
> 
>  You failed to mention which panic you got. Was it 'dup alloc' ? A
>  backtace would be also useful.
> 
>  If it was indeed 'dup alloc', then there is nothing fsck or snapshots
>  can be accused for. Your filesystem is in inconsistent state, which
>  requires full fsck to recover. It must be not mounted while not
>  repaired.
> 
>  Somewhat more interesting is how the fs got into this state.

Thanks for your report and in particular a small file image that
demonstrates the problem. I have been able to reproduce your panic
reliably on my test machine.

Running a normal fsck on the image does indeed show that the filesystem
has corruption that is unexpected on a filesystem running with soft
updates. So, in the end, if the background fsck were able to run, it
would fail and notify the system that it needed to be checked by a
full fsck. But as you have aptly demonstrated, the background fsck
crashes the system as it tries to take a snapshot of the filesystem
on which to run its check.

The cause of the crash is because in taking a snapshot, the filesystem
needs to allocate an inode for the snapshot. As it turns out, the
(Continue reading)

Kirk McKusick | 1 Feb 07:30 2012

Re: kern/164472: [ufs] fsck -B panics on particular data inconsistency

The following reply was made to PR kern/164472; it has been noted by GNATS.

From: Kirk McKusick <mckusick <at> mckusick.com>
To: eugene <at> zhegan.in
Cc: bug-followup <at> FreeBSD.org, freebsd-fs <at> FreeBSD.org,
        Kostik Belousov <kostikbel <at> gmail.com>
Subject: Re: kern/164472: [ufs] fsck -B panics on particular data inconsistency 
Date: Tue, 31 Jan 2012 22:26:43 -0800

 > From: Kostik Belousov <kostikbel <at> gmail.com>
 > To: bug-followup <at> FreeBSD.org, eugene <at> zhegan.in
 > Cc:  
 > Subject: Re: kern/164472: [ufs] fsck -B panics on particular data inconsistency
 > Date: Mon, 30 Jan 2012 07:30:04 +0200
 > 
 >  You failed to mention which panic you got. Was it 'dup alloc' ? A
 >  backtace would be also useful.
 > 
 >  If it was indeed 'dup alloc', then there is nothing fsck or snapshots
 >  can be accused for. Your filesystem is in inconsistent state, which
 >  requires full fsck to recover. It must be not mounted while not
 >  repaired.
 > 
 >  Somewhat more interesting is how the fs got into this state.

 Thanks for your report and in particular a small file image that
 demonstrates the problem. I have been able to reproduce your panic
 reliably on my test machine.

 Running a normal fsck on the image does indeed show that the filesystem
(Continue reading)

Scot Hetzel | 2 Feb 03:10 2012
Picon

Re: amd64/164516: unable to mount EXT2 filesystem

The following reply was made to PR kern/164516; it has been noted by GNATS.

From: Scot Hetzel <swhetzel <at> gmail.com>
To: vermaden <vermaden <at> interia.pl>
Cc: freebsd-gnats-submit <at> freebsd.org
Subject: Re: amd64/164516: unable to mount EXT2 filesystem
Date: Wed, 1 Feb 2012 19:33:51 -0600

 On Thu, Jan 26, 2012 at 9:22 AM, vermaden <vermaden <at> interia.pl> wrote:
 > # mount -t ext2 /dev/md0 /mnt/tmp0
 > mount: /dev/md0 : Operation not supported by device
 >
 The reason you can't mount the ext2fs is that you are using the wrong
 filesystem type, according to the ext2fs man page you should be using:

 mount -t ext2fs /dev/md0 /mnt/tmp0

 ext2fs(5) - http://www.freebsd.org/cgi/man.cgi?query=ext2fs&sektion=5&apropos=0&manpath=FreeBSD+9.0-RELEASE

 Scot
_______________________________________________
freebsd-fs <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe <at> freebsd.org"

Shelm | 2 Feb 11:40 2012
Picon

Re: Hast Unable to listen on address

why the same value on two servers kern.hostuuid

--
View this message in context: http://freebsd.1045724.n5.nabble.com/Hast-Unable-to-listen-on-address-tp5444043p5450286.html
Sent from the freebsd-fs mailing list archive at Nabble.com.
_______________________________________________
freebsd-fs <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe <at> freebsd.org"

Jeremy Chadwick | 2 Feb 12:24 2012

Re: Hast Unable to listen on address

On Thu, Feb 02, 2012 at 02:40:06AM -0800, Shelm wrote:
> why the same value on two servers kern.hostuuid

Please read the code in /etc/rc.d/hostid to understand.  You will also
need to look at /etc/defaults/rc.conf to know what $hostid_file is.

kern.hostuuid and kern.hostid are generated on-the-fly when the system
does not have /etc/hostid.

You can reset this simply by removing the file and rebooting, or by
running "/etc/rc.d/hostid reset".  I do not believe a reboot will be
needed after doing the latter, but you will almost certainly have to
restart daemons.

If you read the above script it should (mostly) make sense.

Likely root cause:

When you made these two systems, you probably mistakingly copied
/etc/hostid from one to the other (or you copied /etc from one to the
other).  Administrator error.

--

-- 
| Jeremy Chadwick                                 jdc <at> parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

_______________________________________________
freebsd-fs <at> freebsd.org mailing list
(Continue reading)

Peter Jeremy | 3 Feb 03:16 2012
Picon

ZFS boot problems revisited

I recently ran into the dreaded "all blocks unavailable" error whilst
upgrading to a recent 8-stable with a 3-way mirrored ZFS root.
Installing the latest gptzfsboot helped a bit but still reported
errors and the boot failed.  I was under the impression that the
latest boot code had resolved all the problems but it seems there are
still some cases it can't cope with.  Comparing the code I built with
the latest head shows no relevant differences (disabling SSE3 & FP and
changing a constant used for RAIDZ parity calculations).

Are there any known cases that the boot code still doesn't handle?

The failures led me to investigate zfsboottest & zfsboottest.sh.
Unfortunately, these tools still have some problems: 
1) zfsboottest is still built as native dynamic executable.  I'm not
   sure if being dynamic presents a problem (I would hope it didn't)
   but it should be an i386 executable on amd64.  The attached patch
   changes it to be a static i386 executable.
2) vfs.root.mountfrom is documented (in sys/kern/vfs_mountroot.c in
   9.x and later or sys/kern/vfs_mount.c in 8.x and earlier) to take a
   space-separated list of <vfsname>:[<path>].  zfsboottest.sh expects
   it to be a bare zpool name.  The attached patch adds the "zfs:"
   prefix but still limits it to a single item.
3) The "you may not be able to boot" message will never appear because
   it's testing the result of the preceeding "rm -f", rather than the
   diff (as wanted).  The attached patch fixes this.

I'm still not confident that the flags used to build zfsboottest are
equivalent to those used to build gptzfsboot but will leave that for
later investigation.

(Continue reading)

António Trindade | 3 Feb 14:09 2012
Picon

Re: kernel: panic: softdep_sync_buf: Unknown type jnewblk

Hi!

After a few days, I finally got a kernel panic like the one I reported earlier.

I attach the file info.0. I'm not attaching the vmcore.0 and core.txt.0 files, because they are over 100MB
in size. If needed, they can be downloaded from http://trindade.myphotos.cc/crash/vmcore.0.gz and http://trindade.myphotos.cc/crash/core.txt.0.

I remind you that I am not using snapshots, at least conscientiously.

Hope this helps diagnosing the problem.

Meanwhile I deactivated SU+J again and reverted back to plain old SU.

Best regards.

------ BEGIN info.0 ------
Dump header from device /dev/ad0s1b
  Architecture: i386
  Architecture Version: 2
  Dump Length: 193622016B (184 MB)
  Blocksize: 512
  Dumptime: Thu Feb  2 22:57:19 2012
  Hostname: gatekeeper.darklair.homeunix.net
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 9.0-RELEASE #0: Sun Jan 15 01:22:14 WET 2012
    root <at> gatekeeper.darklair.homeunix.net:/usr/obj/usr/src/sys/GATEKEEPER
  Panic String: softdep_sync_buf: Unknown type jnewblk
  Dump Parity: 3424763946
  Bounds: 0
  Dump Status: good
(Continue reading)

Peter Maloney | 3 Feb 14:46 2012
Picon

Re: ZFS boot problems revisited

The causes of some zfs boot failures are unknown. I don't know why it
does it. I updated 2 nearly identical systems to 8-STABLE from
8.2-RELEASE and (if I remember correctly) got the same error as you on
one but not the other.

I have only tried 2 way mirrors so far, so I might not know much about
this specific issue, but what comes to mind is what I would call the
'standard zfs boot fix', which I first found here:
http://freebsd.1045724.n5.nabble.com/Difficulties-to-use-ZFS-root-ROOT-MOUNT-ERROR-td4771828.html

It basically goes like this:

Boot off of something with zfs support (eg. a DVD).
Then run these commands (assuming here your root is named "zroot").

zpool import -o altroot=/z -o cachefile=/tmp/zpool.cache zroot
zfs set mountpoint=/ zroot
cp /tmp/zpool.cache /z/boot/zfs/zpool.cache
shutdown -r now

The "mountpoint=/" part is required. And then optionally, you would set
it back to "legacy" before the reboot if that is the way you do things.
I do not prefer "mountpoint=legacy", which most people seem to have and
seems to be in all the howtos, because then if something goes wrong,
altroot will work without unmounting /usr, /var, etc. first and
remounting it all after / is mounted. (which affects things like chroot,
but not simply editing conf if it is in the same dataset).

And do not export the pool, or forget/skip the cache file part, or you
get the same error that you started with. And if you messed up your
(Continue reading)


Gmane