Sergei Trofimovich | 9 Apr 2011 16:04
Picon
Gravatar

btrfs failure and BUG_ON behaviour

I've decided to use UML when caught some hard to
debug OOpses on btrfs.

The first attempt to use btrfs on UML gave me stable
UML crash. I suspect it's a UML's problem.

So I have some questions here (I'm on x86_64,
2.6.39-rc2):

1. (major one) BUG_ON trace in UML does not look
   as in real kernel. ud2 handler does not show us nice
   backtrace, but calls some suspicious handler.

/mnt/btr # touch asd-`seq 1 10000`
[   95.540000] Kernel panic - not syncing: Kernel mode signal 4
[   95.540000] Call Trace: 
[   95.540000] 602d9908:  [<60227a48>] panic+0xea/0x1e0
[   95.540000] 602d99b8:  [<600342ed>] do_softirq+0x4d/0x70
[   95.540000] 602d9a08:  [<60016947>] relay_signal+0x87/0xa0
[   95.540000] 602d9a18:  [<60021f70>] set_signals+0x30/0x40
[   95.540000] 602d9a38:  [<60021df9>] sig_handler_common+0x59/0xd0
[   95.540000] 602d9a58:  [<60021f2c>] real_alarm_handler+0x3c/0x50
[   95.540000] 602d9ae0:  [<6018c0a0>] strncpy+0x20/0x30
[   95.540000] 602d9b78:  [<6002200f>] sig_handler+0x3f/0x50
[   95.540000] 602d9b98:  [<600222a1>] handle_signal+0x71/0xb0
[   95.540000] 602d9be8:  [<60023630>] hard_handler+0x10/0x20
[   95.540000] 602d9ca8:  [<6011263c>] lookup_inline_extent_backref+0x2dc/0x3f0
[   95.540000] 
[   95.540000] 
[   95.540000] Pid: 1, comm: sh Not tainted 2.6.39-rc2+
(Continue reading)

richard -rw- weinberger | 9 Apr 2011 23:33
Picon

Re: btrfs failure and BUG_ON behaviour

On Sat, Apr 9, 2011 at 4:04 PM, Sergei Trofimovich <slyich <at> gmail.com> wrote:
> I've decided to use UML when caught some hard to
> debug OOpses on btrfs.
>
> The first attempt to use btrfs on UML gave me stable
> UML crash. I suspect it's a UML's problem.
>
> So I have some questions here (I'm on x86_64,
> 2.6.39-rc2):
>
> 1. (major one) BUG_ON trace in UML does not look
>   as in real kernel. ud2 handler does not show us nice
>   backtrace, but calls some suspicious handler.
>
> /mnt/btr # touch asd-`seq 1 10000`
> [   95.540000] Kernel panic - not syncing: Kernel mode signal 4
> [   95.540000] Call Trace:
> [   95.540000] 602d9908:  [<60227a48>] panic+0xea/0x1e0
> [   95.540000] 602d99b8:  [<600342ed>] do_softirq+0x4d/0x70
> [   95.540000] 602d9a08:  [<60016947>] relay_signal+0x87/0xa0
> [   95.540000] 602d9a18:  [<60021f70>] set_signals+0x30/0x40
> [   95.540000] 602d9a38:  [<60021df9>] sig_handler_common+0x59/0xd0
> [   95.540000] 602d9a58:  [<60021f2c>] real_alarm_handler+0x3c/0x50
> [   95.540000] 602d9ae0:  [<6018c0a0>] strncpy+0x20/0x30
> [   95.540000] 602d9b78:  [<6002200f>] sig_handler+0x3f/0x50
> [   95.540000] 602d9b98:  [<600222a1>] handle_signal+0x71/0xb0
> [   95.540000] 602d9be8:  [<60023630>] hard_handler+0x10/0x20
> [   95.540000] 602d9ca8:  [<6011263c>] lookup_inline_extent_backref+0x2dc/0x3f0
> [   95.540000]
> [   95.540000]
(Continue reading)

Sergei Trofimovich | 10 Apr 2011 08:56
Picon
Gravatar

Re: btrfs failure and BUG_ON behaviour

On Sat, 9 Apr 2011 23:33:14 +0200
richard -rw- weinberger <richard.weinberger <at> gmail.com> wrote:

> We have had some problems with the block layer in ~2.6.32 to 2.6.35.
> Commit 4752690 fixed the issues.
> 
> Can you please re-test it with 2.6.38?

Same crash with 2.6.38, so it's either very sneaky btrfs bug or
some discrepancy in ubd.

> Does you script work with other file systems?

Tested on ext4, xfs, jfs and reiserfs3. All survive
the load of creating and deleting 1000 files.

By on-disk load reiserfs3 should be quite similar to btrfs,
so I expected it to show problems, but it didn't.

[n00bish speculation] btrfs also plays games with mm. Maybe
the problem sits somewhere there where block layer/page
cache and on-disk layout gets out-of-sync.

[sf] ~/linux-2.6/fs/btrfs:fgrep -R PageDirty .
./extent_io.c:                          if (!PageDirty(pages[i]) ||
./extent_io.c:          if (!PageDirty(page))
./extent_io.c:          if (!PageDirty(page)) {
./disk-io.c:    if (PageDirty(page))
./disk-io.c:    if (PageWriteback(page) || PageDirty(page))
./disk-io.c:                    if (PageDirty(page)) {
(Continue reading)

Sergei Trofimovich | 10 Apr 2011 21:58
Picon
Gravatar

Re: btrfs failure and BUG_ON behaviour

> I'll try to send bug report to linux-btrfs mailing list first
> and return if I'll have more details.

I've bisected it down to memcpy changes
59daa706fbec745684702741b9f5373142dd9fdc and found
out btrfs piece of code passing overlapping areas to memcpy.

Sorry for the noise.

--

-- 

  Sergei
------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
richard -rw- weinberger | 12 Apr 2011 01:13
Picon

Re: btrfs failure and BUG_ON behaviour

On Sat, Apr 9, 2011 at 4:04 PM, Sergei Trofimovich <slyich <at> gmail.com> wrote:
> I've decided to use UML when caught some hard to
> debug OOpses on btrfs.
>
> The first attempt to use btrfs on UML gave me stable
> UML crash. I suspect it's a UML's problem.
>
> So I have some questions here (I'm on x86_64,
> 2.6.39-rc2):
>
> 1. (major one) BUG_ON trace in UML does not look
>   as in real kernel. ud2 handler does not show us nice
>   backtrace, but calls some suspicious handler.
>
> /mnt/btr # touch asd-`seq 1 10000`
> [   95.540000] Kernel panic - not syncing: Kernel mode signal 4
> [   95.540000] Call Trace:
> [   95.540000] 602d9908:  [<60227a48>] panic+0xea/0x1e0
> [   95.540000] 602d99b8:  [<600342ed>] do_softirq+0x4d/0x70
> [   95.540000] 602d9a08:  [<60016947>] relay_signal+0x87/0xa0
> [   95.540000] 602d9a18:  [<60021f70>] set_signals+0x30/0x40
> [   95.540000] 602d9a38:  [<60021df9>] sig_handler_common+0x59/0xd0
> [   95.540000] 602d9a58:  [<60021f2c>] real_alarm_handler+0x3c/0x50
> [   95.540000] 602d9ae0:  [<6018c0a0>] strncpy+0x20/0x30
> [   95.540000] 602d9b78:  [<6002200f>] sig_handler+0x3f/0x50
> [   95.540000] 602d9b98:  [<600222a1>] handle_signal+0x71/0xb0
> [   95.540000] 602d9be8:  [<60023630>] hard_handler+0x10/0x20
> [   95.540000] 602d9ca8:  [<6011263c>] lookup_inline_extent_backref+0x2dc/0x3f0
> [   95.540000]
> [   95.540000]
(Continue reading)

Sergei Trofimovich | 12 Apr 2011 12:13
Picon
Gravatar

Re: btrfs failure and BUG_ON behaviour

On Tue, 12 Apr 2011 01:13:30 +0200
richard -rw- weinberger <richard.weinberger <at> gmail.com> wrote:

> >   Here we have BUG_ON in lookup_inline_extent_backref+0x2dc/0x3f0, but that pesky
> >   hard_handler. Is it an UML bug or known and expected behaviour?  
> 
> Can you please test the attached patch?
> The trace was broken long before I started working on UML.

Attached the BUG backtrace for a crash  without and with patch.
Patched one looks great!

The minor nit is Kbuild does not detect headers change, so I had to
'make clean' build dir first.

Thanks!

--

-- 

  Sergei
/ # rm -rf /mnt/btr/var_tmp/paludis/dev-lang-jhc-9999/
[   50.680000] Kernel panic - not syncing: Kernel mode signal 4
[   50.680000] Call Trace: 
[   50.680000] 6023d918:  [<601a2342>] panic+0xea/0x1dc
[   50.680000] 6023d938:  [<6001c4a1>] ubd_intr+0x6d/0xc9
[   50.680000] 6023d978:  [<6004d5a9>] handle_irq_event_percpu+0x10a/0x126
[   50.680000] 6023d9c8:  [<6004d5e8>] handle_irq_event+0x23/0x2f
[   50.680000] 6023d9e8:  [<60018092>] free_irqs+0x72/0xdc
(Continue reading)

richard -rw- weinberger | 12 Apr 2011 13:22
Picon

Re: btrfs failure and BUG_ON behaviour

On Tue, Apr 12, 2011 at 12:13 PM, Sergei Trofimovich <slyich <at> gmail.com> wrote:
> On Tue, 12 Apr 2011 01:13:30 +0200
> richard -rw- weinberger <richard.weinberger <at> gmail.com> wrote:
>
>> >   Here we have BUG_ON in lookup_inline_extent_backref+0x2dc/0x3f0, but that pesky
>> >   hard_handler. Is it an UML bug or known and expected behaviour?
>>
>> Can you please test the attached patch?
>> The trace was broken long before I started working on UML.
>
> Attached the BUG backtrace for a crash  without and with patch.
> Patched one looks great!

Good.

> The minor nit is Kbuild does not detect headers change, so I had to
> 'make clean' build dir first.

The patch adds only one file and does not touch anything else.
So Kbuild can't detect this.

> Thanks!
>
> --
>
>  Sergei
>

--

-- 
Thanks,
(Continue reading)

Nolan Leake | 13 Apr 2011 01:51

[PATCH] um: Add a "ucast" ethernet transport

The "ucast" transport is similar to the mcast transport (and, in fact,
shares most of its code), only it uses UDP unicast to move packets.

Obviously this is only useful for point-to-point connections between
virtual ethernet devices.

Signed-off-by: Nolan Leake <nolan <at> cumulusnetworks.com>
---
 Documentation/uml/UserModeLinux-HOWTO.txt |   10 +++
 arch/um/drivers/mcast.h                   |    7 ++-
 arch/um/drivers/mcast_kern.c              |   82 +++++++++++++++++++--
 arch/um/drivers/mcast_user.c              |  113 +++++++++++++++++------------
 4 files changed, 157 insertions(+), 55 deletions(-)

diff --git a/Documentation/uml/UserModeLinux-HOWTO.txt b/Documentation/uml/UserModeLinux-HOWTO.txt
index 9b7e190..5d0fc8b 100644
--- a/Documentation/uml/UserModeLinux-HOWTO.txt
+++ b/Documentation/uml/UserModeLinux-HOWTO.txt
 <at>  <at>  -1182,6 +1182,16  <at>  <at> 
   forge.net/>  and explains these in detail, as well as
   some other issues.

+  There is also a related point-to-point only "ucast" transport.
+  This is useful when your network does not support multicast, and
+  all network connections are simple point to point links.
+
+  The full set of command line options for this transport are
+
+
+       ethn=ucast,ethernet address,remote address,listen port,remote port
(Continue reading)

Richard Weinberger | 13 Apr 2011 12:58
Picon

Re: [PATCH] um: Add a "ucast" ethernet transport

Hi,

Am Mittwoch 13 April 2011, 01:51:33 schrieb Nolan Leake:
> The "ucast" transport is similar to the mcast transport (and, in fact,
> shares most of its code), only it uses UDP unicast to move packets.
> 
> Obviously this is only useful for point-to-point connections between
> virtual ethernet devices.
> 
> Signed-off-by: Nolan Leake <nolan <at> cumulusnetworks.com>
> ---
>  Documentation/uml/UserModeLinux-HOWTO.txt |   10 +++
>  arch/um/drivers/mcast.h                   |    7 ++-
>  arch/um/drivers/mcast_kern.c              |   82 +++++++++++++++++++--
>  arch/um/drivers/mcast_user.c              |  113
> +++++++++++++++++------------ 4 files changed, 157 insertions(+), 55
> deletions(-)
> 

Sorry, your mailer screwed the patch.
Please use git send-email.

With your patch drivers/mcast* would contain unicast stuff.
This is a bit confusing.
Either rename the mcast files or put everything which have mcast 
and ucast in common into a separate file.

Are you using this patch in a production environment?

Thanks,
(Continue reading)

James McMechan | 14 Apr 2011 21:50
Picon
Favicon

Re: gcc-4.6.0 generates no code for sub_preempt_count()


> Am Donnerstag 14 April 2011, 09:49:14 schrieb Mikael Pettersson:
> > Richard Weinberger writes:
> > > Hi,
> > >
> > > I'm facing a very strange issue with gcc-4.6.0 and UML.
> > > Within __local_bh_enable() gcc generates no code for
> > > sub_preempt_count().
> > >
> > > See:
> > > http://userweb.kernel.org/~rw/uml-gcc460/__local_bh_enable-gcc460.txt
> > > vs.
> > > http://userweb.kernel.org/~rw/uml-gcc460/__local_bh_enable-gcc431.txt
> > >
> > > Interestingly it generates code for add_preempt_count().
> > > I can reproduce this on x86 and x86_64.
> > >
> > > The problem has to do with UML's current_thread_info() function.
> > > When I replace it with arch/x86's (unportable) variant gcc generates
> > > code.
> > >
> > > Any ideas whether this is a gcc or a kernel issue?

It looks like a gcc error with one of the optimization passes

> > Please provide a standalone test case.
>
> There you go!
> http://userweb.kernel.org/~rw/uml-gcc460/testcase.c

(Continue reading)


Gmane