Picon

[AT91RM9200] further run-time problems (jffs2, Oops in __update_rq_clock, IPSec)

Hi,

now that our at91rm9200-based system boots 2.6.23 and runs in principle, 
we're getting further bad problems:

1. jffs2. After a few reboots we get lots of

JFFS2 notice: (708) jffs2_get_inode_nodes: Node header CRC failed at ...

they do not come under 2.6.11. Looks like under 2.6.11 this case is just 
not tested, but a few first reboots under 2.6.23 come also clean through, 
so, it is something, that happens later.

Using physmap:

physmap platform flash device: 01000000 at 10000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
 Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: CFI does not contain boot bank location. Assuming top.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.

2. There has been an Oops once in vi...

Unable to handle kernel paging request at virtual address e5dcc3ec
pgd = c135c000
[e5dcc3ec] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in:
CPU: 0    Not tainted  (2.6.23.1-ga63c3b88-dirty #52)
(Continue reading)

Picon

Re: [AT91RM9200] further run-time problems (jffs2, Oops in __update_rq_clock, IPSec)

On Mon, Nov 12, 2007 at 03:45:04PM +0100, Guennadi Liakhovetski wrote:
> 2. There has been an Oops once in vi...
> 
> Unable to handle kernel paging request at virtual address e5dcc3ec
> pgd = c135c000
> [e5dcc3ec] *pgd=00000000
> Internal error: Oops: 5 [#1]
> Modules linked in:
> CPU: 0    Not tainted  (2.6.23.1-ga63c3b88-dirty #52)
> PC is at __update_rq_clock+0x4c/0x140
> LR is at __update_rq_clock+0x28/0x140
> pc : [<c0033e38>]    lr : [<c0033e14>]    psr: 60000093
> sp : c1517b08  ip : e5dcc010  fp : c0117b3c
> r10: c025125c  r9 : 00000001  r8 : 00000000
> r7 : 00000000  r6 : c1cc9720  r5 : e5dcc3ec  r4 : e2be3800
> r3 : 00989680  r2 : ffffd430  r1 : 00989665  r0 : e2be3800
> Flags: nZCv  IRQs off nt user
> Control: c000717f  Table: 2135c000  DAC: 00000015
> Process vi (pid: 2017, stack limit = 0xc1516258)
> Stack: (0xc1517b08 to 0xc1518000)
> Backtrace: frame pointer underflow
^^^^^^^^^^^^^^ hint.

> Backtrace aborted due to bad frame pointer <c0117b3c>
> Code: e0c88005 e51bc034 e3580000 e28c5ff7 (e8950060)
> 
> __update_rq_clock:
>         @ args = 0, pretend = 0, frame = 12
>         @ frame_needed = 1, uses_anonymous_args = 0
>         mov     ip, sp  @,
>         stmfd   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc}       @,
>         sub     fp, ip, #4      @,,
>         sub     sp, sp, #12     @,,
>         ldr     r3, .L33+8      @ tmp108,
>         add     r4, r0, #996    @ prev_raw, rq,
>         ldmia   r4, {r4-r5}     @ prev_raw
>         str     r0, [fp, #-52]  @ tmp12,
>         mov     lr, pc
>         bx      r3      @ tmp108
>         str     r0, [fp, #-48]  @, now
>         str     r1, [fp, #-44]  @, now
>         mov     r8, r1  @ delta,
>         mov     r7, r0  @ delta,
>         subs    r7, r7, r4      @ delta, delta, prev_raw
>         sbc     r8, r8, r5      @ delta, delta, prev_raw
>         ldr     ip, [fp, #-52]  @,
>         cmp     r8, #0  @ delta,
>         add     r5, ip, #988    @ clock.432, rq,
>         ldmia   r5, {r5-r6}     @ clock.432
> 	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Notice where 'r5' comes from - 'ip', which comes from '[fp, #-52]'.
Now read through from the start of the function and see what value 'fp'
is supposed to have.

sp at the ldmia instruction is currently at 0xc1517b08.  Add 12.
There's 11 registers pushed into the stack, so add 44 bytes.  This
is the value of 'ip' and 'sp' after the first instruction.  This
gives a value of 0xc1517b40.

To confirm this, here's the state which the stmfd instruction saved
onto the stack:

7b00:                   c0314974 c1517b18 c0046378 ffffd43a c1cc9860 c1cc9720
                           A        B        C       r4       r5       r6
7b20: c02f4554 c1516000 00000001 c025125c c1517b64 c1517b40 c0250a74 c0033dfc
        r7       r8       r9       sl       fp       ip       lr       pc

That means 'fp' after the first 'sub' instruction should be 0xc1517b3c.
However, it is actually 0xc0117b3c.  Note that these two values look
very similar.  The difference is only 0x01400000.  Two bit errors in
SDRAM?

The other thing to consider is sched_clock() - the 'bx' instruction
you've marked above is calling that function.  Is it messing up the
frame pointer?

Also note that r8, r1 and r5 values.  The code immediately before which
(Continue reading)

Adrian Bunk | 27 Oct 16:18
Favicon

jffs2_init_acl_post() can return uninitialized variable

Commit cfc8dc6f6f69ede939e09c2af06a01adee577285 added the following 
function that can return the value of an uninitialized variable:

<--  snip  -->

...
int jffs2_init_acl_post(struct inode *inode)
{
        struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
        int rc;

        if (f->i_acl_default) {
                rc = __jffs2_set_acl(inode, JFFS2_XPREFIX_ACL_DEFAULT, f->i_acl_default);
                if (rc)
                        return rc;
        }

        if (f->i_acl_access) {
                rc = __jffs2_set_acl(inode, JFFS2_XPREFIX_ACL_ACCESS, f->i_acl_access);
                if (rc)
                        return rc;
        }

        return rc;
}
...

<--  snip  -->

Spotted by the Coverity checker.
(Continue reading)

WANG Cong | 27 Oct 16:47
Picon

[Git Patch]fs/jffs2/acl.c: Fix a may-be-uninitialized return value


Fix a may-be-uninitialized return value.

Found-by: Adrian Bunk <bunk <at> kernel.org>
Signed-off-by: WANG Cong <xiyou.wangcong <at> gmail.com>

---
 fs/jffs2/acl.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 9728614..5b14062 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -358,7 +358,7 @@ int jffs2_init_acl_pre(struct inode *dir_i, struct inode *inode, int *i_mode)
 int jffs2_init_acl_post(struct inode *inode)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
-	int rc;
+	int rc = 0;

 	if (f->i_acl_default) {
 		rc = __jffs2_set_acl(inode, JFFS2_XPREFIX_ACL_DEFAULT, f->i_acl_default);

--

-- 
May the Source Be With You.
Adrian Bunk | 24 Oct 18:27
Favicon

[2.6 patch] make jffs2_get_acl() static

jffs2_get_acl() can now become static again.

Signed-off-by: Adrian Bunk <bunk <at> kernel.org>

---

 fs/jffs2/acl.c |    2 +-
 fs/jffs2/acl.h |    2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

add2b887d64536f3fe978e62f0774292456f1ddb 
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 9728614..b14e805 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -176,7 +176,7 @@ static void jffs2_iset_acl(struct inode *inode, struct posix_acl **i_acl, struct
 	spin_unlock(&inode->i_lock);
 }

-struct posix_acl *jffs2_get_acl(struct inode *inode, int type)
+static struct posix_acl *jffs2_get_acl(struct inode *inode, int type)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
 	struct posix_acl *acl;
diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h
index 76c6ebd..0bb7f00 100644
--- a/fs/jffs2/acl.h
+++ b/fs/jffs2/acl.h
@@ -28,7 +28,6 @@ struct jffs2_acl_header {

(Continue reading)

KaiGai Kohei | 25 Oct 04:19
Picon

Re: [2.6 patch] make jffs2_get_acl() static

Adrian Bunk wrote:
> jffs2_get_acl() can now become static again.
> 
> Signed-off-by: Adrian Bunk <bunk <at> kernel.org>

Acked-by: KaiGai Kohei <kaigai <at> ak.jp.nec.com>

> ---
> 
>  fs/jffs2/acl.c |    2 +-
>  fs/jffs2/acl.h |    2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> add2b887d64536f3fe978e62f0774292456f1ddb 
> diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
> index 9728614..b14e805 100644
> --- a/fs/jffs2/acl.c
> +++ b/fs/jffs2/acl.c
> @@ -176,7 +176,7 @@ static void jffs2_iset_acl(struct inode *inode, struct posix_acl **i_acl, struct
>  	spin_unlock(&inode->i_lock);
>  }
>  
> -struct posix_acl *jffs2_get_acl(struct inode *inode, int type)
> +static struct posix_acl *jffs2_get_acl(struct inode *inode, int type)
>  {
>  	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
>  	struct posix_acl *acl;
> diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h
> index 76c6ebd..0bb7f00 100644
> --- a/fs/jffs2/acl.h
> +++ b/fs/jffs2/acl.h
> @@ -28,7 +28,6 @@ struct jffs2_acl_header {
>  
>  #define JFFS2_ACL_NOT_CACHED ((void *)-1)
>  
> -extern struct posix_acl *jffs2_get_acl(struct inode *inode, int type);
>  extern int jffs2_permission(struct inode *, int, struct nameidata *);
>  extern int jffs2_acl_chmod(struct inode *);
>  extern int jffs2_init_acl_pre(struct inode *, struct inode *, int *);
> @@ -40,7 +39,6 @@ extern struct xattr_handler jffs2_acl_default_xattr_handler;
>  
>  #else
>  
> -#define jffs2_get_acl(inode, type)		(NULL)
>  #define jffs2_permission			(NULL)
>  #define jffs2_acl_chmod(inode)			(0)
>  #define jffs2_init_acl_pre(dir_i,inode,mode)	(0)
> 
> 

(Continue reading)

Erez Zadok | 19 Oct 08:05
Picon
Favicon

BUG at mm/filemap.c:1749 (2.6.24, jffs2, unionfs)

David,

I'm testing unionfs on top of jffs2, using 2.6.24 as of linus's commit
4fa4d23fa20de67df919030c1216295664866ad7.  All of my unionfs tests pass when
unionfs is stacked on top of jffs2, other than my truncate test -- whic
tries to truncate files up/down (through the union, which then is passed
through to the lower jffs2 f/s).  The same truncate test passes on all other
file systems I've tried unionfs/2.6.24 with, as well as all of the earlier
kernels that unionfs runs on (2.6.9--2.6.23).  So I tend to think this bug
is more probably due to something else going on in 2.6.24, possibly wrt
jffs2/mtd.  (Of course, it's still possible that unionfs isn't doing
something right -- any pointers?)

The oops trace is included below.  Is this a known issue and if so, any
fixes?  If this is the first you hear of this problem, let me know and I'll
try to narrow it down further.

Thanks,
Erez.

------------[ cut here ]------------
kernel BUG at mm/filemap.c:1749!
invalid opcode: 0000 [#1] DEBUG_PAGEALLOC
Modules linked in: block2mtd mtdblock jffs2 mtd_blkdevs mtd zlib_deflate
zlib_inflate nfsd exportfs auth_rpcgss nfs lockd nfs_acl sunrpc pcnet32
CPU:    0
EIP:    0060:[<c012f03d>]    Not tainted VLI
EFLAGS: 00010287   (2.6.23-unionfs2-2.6.24-rc0-pre #9)
EIP is at iov_iter_advance+0x13/0x5d
eax: c538fdec   ebx: 00001000   ecx: c538fdec   edx: 00001000
(Continue reading)

Jason Lunz | 30 Aug 20:23

jffs2 deadlock introduced in linux 2.6.22.5


commit 1d8715b388c978b0f1b1bf4812fcee0e73b023d7 was added between
2.6.22.4 and 2.6.22.5 to cure a locking problem, but it seems to have
introduced another (worse?) one.

With a jffs2 filesystem (on block2mtd) on a 2.6.22.5 kernel, if I do
anything that appends to a file with many small writes, I get what looks
like a deadlock between the writer and the jffs2 gc thread. For example:

	# while true; do echo >> /some/file/on/jffs2; done

will result in the bash hanging in D state, with these kernel stacks in
dmesg after "echo t > /proc/sysrq-trigger":

jffs2_gcd_mtd S DFD1EEA8     0  1086      2 (L-TLB)
       dfd1eebc 00000046 00000002 dfd1eea8 dfd1eea4 00000000 00000000 c0334a00 
       c0334a00 00000000 0000000a dfcb8550 2ee3df10 0000001a 00002280 dfcb8670 
       c1407a00 00000000 00000286 df9fa600 dfe20900 ffff414a c1407ec4 0000ffff 
Call Trace:
 [<c026b84c>] __down_interruptible+0xb2/0x10b
 [<c0269e4b>] __sched_text_start+0x14b/0x8a4
 [<c0115380>] default_wake_function+0x0/0xc
 [<c026b727>] __down_failed_interruptible+0x7/0xc
 [<e09425bd>] jffs2_garbage_collect_pass+0x20/0x597 [jffs2]
 [<c0120cd0>] __dequeue_signal+0xd7/0x11c
 [<c01209ed>] recalc_sigpending+0xb/0x1d
 [<c01221e5>] dequeue_signal+0x9d/0x117
 [<e09439e7>] jffs2_garbage_collect_thread+0x11b/0x15a [jffs2]
 [<c0103bf6>] ret_from_fork+0x6/0x1c
 [<e09438cc>] jffs2_garbage_collect_thread+0x0/0x15a [jffs2]
(Continue reading)

Jason Lunz | 31 Aug 23:26

Re: jffs2 deadlock introduced in linux 2.6.22.5


On Thu, Aug 30, 2007 at 11:23:55AM -0700, Jason Lunz wrote: > commit 1d8715b388c978b0f1b1bf4812fcee0e73b023d7 was added between > 2.6.22.4 and 2.6.22.5 to cure a locking problem, but it seems to have > introduced another (worse?) one.
I spoke too soon. I checked more carefully, and this problem was introduced somewhere between 2.6.21 and 2.6.22. The jffs2 fix in 2.6.22.5 isn't the culprit. Jason
Jason Lunz | 1 Sep 21:06

[jffs2] [rfc] fix write deadlock regression


I've bisected the deadlock when many small appends are done on jffs2 down to
this commit:

commit 6fe6900e1e5b6fa9e5c59aa5061f244fe3f467e2
Author: Nick Piggin <npiggin <at> suse.de>
Date:   Sun May 6 14:49:04 2007 -0700

    mm: make read_cache_page synchronous

    Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd.  All depending on whether the filler is async and/or can return
    with a !uptodate page.

It introduced a wait to read_cache_page, as well as a
read_cache_page_async function equivalent to the old read_cache_page
without any callers.

Switching jffs2_gc_fetch_page to read_cache_page_async for the old
behavior makes the deadlocks go away, but maybe reintroduces the
use-before-uptodate problem? I don't understand the mm/fs interaction
well enough to say.

Someone more knowledgable should see if similar deadlock issues may have
been introduced for other read_cache_page callers, including the other
(Continue reading)

Nick Piggin | 2 Sep 06:20
Picon
Favicon

Re: [jffs2] [rfc] fix write deadlock regression

On Sat, Sep 01, 2007 at 12:06:03PM -0700, Jason Lunz wrote:
> 
> It introduced a wait to read_cache_page, as well as a
> read_cache_page_async function equivalent to the old read_cache_page
> without any callers.
> 
> Switching jffs2_gc_fetch_page to read_cache_page_async for the old
> behavior makes the deadlocks go away, but maybe reintroduces the
> use-before-uptodate problem? I don't understand the mm/fs interaction
> well enough to say.
> 
> Someone more knowledgable should see if similar deadlock issues may have
> been introduced for other read_cache_page callers, including the other
> two in jffs2.

Hmm, thanks for that. It does sound like it is deadlocking via
commit_write(). OTOH, it seems like it could be using the page
before it is uptodate -- it _may_ only be dealing with uptodate
data at that point... but if so, why even read_cache_page at
all?

However, it is a regression. So unless David can come up with a
more satisfactory approach, I guess we'd have to go with your
patch.

> 
>     Signed-off-by: Jason Lunz <lunz <at> falooley.org>
> 
> ---
>  fs/jffs2/fs.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
> index 1d3b7a9..8bc727b 100644
> --- a/fs/jffs2/fs.c
> +++ b/fs/jffs2/fs.c
> @@ -627,7 +627,7 @@ unsigned char *jffs2_gc_fetch_page(struct jffs2_sb_info *c,
>  	struct inode *inode = OFNI_EDONI_2SFFJ(f);
>  	struct page *pg;
>  
> -	pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
> +	pg = read_cache_page_async(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
>  			     (void *)jffs2_do_readpage_unlock, inode);
>  	if (IS_ERR(pg))
>  		return (void *)pg;
(Continue reading)


Gmane