Vara Prasad | 1 May 2007 03:32
Picon
Favicon

Re: Results of 20070428 snapshot on ppc64

Hi Srini,

Thanks for your help in running the weekly tests and posting the results.

Srinivasa Ds wrote:

>Results of systemtap-20070428 snapshot on ppc64
>==============================================
>Date: 200704300942
>User: root
>Kernel: Linux 2.6.21-rc7 #5 SMP Mon Apr 23 09:48:27 IST 2007 ppc64 ppc64
>ppc64 GNU/Linux
>
>Testsuite summary of failed tests
>FAIL: systemtap.samples/lket(semantic error)
>FAIL: sysopen (1)                      (PR 3429)
>FAIL: 64-bit mmap                      (PR 4088)
>FAIL: 64-bit readwrite
>FAIL: 64-bit signal
>FAIL: 64-bit statfs
>  
>
Do you have a bug no. for the above 64bit failures?

>FAIL: 32-bit alarm                      (PR 4332)
>FAIL: 32-bit clock                      (PR 4332)
>FAIL: 32-bit mmap                       (PR 4088)
>FAIL: 32-bit readwrite
>  
>
(Continue reading)

Andi Kleen | 1 May 2007 04:57

Re: notify_page_fault() problem

> However, vmalloc_sync_all() is i386 and x86_64 specific as well
> as their change to register_page_fault_notifier().  I don't see
> other platform doing anything else doing anything special in their
> register_page_fault_notifier().  

They probably just haven't tested this particular case yet.
x86 also did it originally to handle NMI notifiers, which is a x86 special
(nested pagefault in NMI can lead to stack corruption because
NMIs are only blocked until the next IRET)

> I have trouble believing that x86
> and ARM are unique somehow with needing to address this problem.
> Why doesn't anyone else hit this?  Is it a lurking problem or are
> there other fixes in other forms out there?

The standard kprobes notifier is not modular so it won't hit this.

> I guess part of the answer has to do with what people's expectations
> are for intercepting faults with their kprobes fault handler though.

Yes, some have pretty broad exceptions.  It might be possible
to move it to a kernel address only path, but then some debuggers
seem to want to debug user mode too.

But you're right there has been grumbling about the overhead
of the notifier call in the hot path.

-Andi

(Continue reading)

Quentin Barnes | 1 May 2007 05:24
Picon

Re: notify_page_fault() problem

On Tue, May 01, 2007 at 04:57:00AM +0200, Andi Kleen wrote:
>> However, vmalloc_sync_all() is i386 and x86_64 specific as well
>> as their change to register_page_fault_notifier().  I don't see
>> other platform doing anything else doing anything special in their
>> register_page_fault_notifier().  
>
>They probably just haven't tested this particular case yet.

I got it to happen most often when running the syscall.exp test.
But it was still very intermittent though.  I'm guessing it
has to do with what else got placed on the same page with the
kprobe/kretprobe data structure (so it would occasionally get
coincidentally loaded and work) and if the system is running
preempt-enabled and how busy it is to have another page fault occur
before the kprobes data structure could get its translation fault to
happen.  If the system is quiescent, this bug's not going to show up
either.  I'm currently running my lowly 64MB ARM board with network
boot _and_ swap drives so a lot system pounding is going on most all
the time.

>x86 also did it originally to handle NMI notifiers, which is a
>x86 special (nested pagefault in NMI can lead to stack corruption
>because NMIs are only blocked until the next IRET)

Ah, ok.

>> I have trouble believing that x86
>> and ARM are unique somehow with needing to address this problem.
>> Why doesn't anyone else hit this?  Is it a lurking problem or are
>> there other fixes in other forms out there?
(Continue reading)

Favicon

[Bug testsuite/4329] systemtap.samples/sysopen test fails on several arch's


------- Additional Comments From srinivasa at in dot ibm dot com  2007-05-01 08:16 -------
In ppc64 arch, we used to see bug. 
 Reason:- src/testsuite/systemtap.samples/sysopen.exp script executes
sysopen.stp, which inturn prints how many number of files were opened in 10000
ms. Then expect script expects atleat 2 files to be opened in this time.
=============================================
[root <at> llm27lp1 stap_testing_200705010601]# cat
src/testsuite/systemtap.samples/sysopen.exp

set test "sysopen"
if {![installtest_p]} { untested $test; return }
spawn stap $srcdir/$subdir/sysopen.stp
set ok 0
expect {
    -timeout 60 -re {.* opened .*\r} { incr ok; exp_continue }
    timeout { fail "$test (timeout)" }
    eof { }
}
#FIXME does not handle case of hanging sysopen.stp correctly
wait
if {$ok > 1} { pass "$test ($ok)" } { fail "$test ($ok)" }
==========================================

 But in ppc64 only one file was getting opened. So we are getting this error.
===================================================
Running
/home/systemtap/tmp/stap_testing_200704300942/src/testsuite/systemtap.samples/sysopen.exp
...
irqbalance opened /proc/interrupts
(Continue reading)

Srinivasa Ds | 1 May 2007 09:21
Picon

Re: Results of 20070428 snapshot on ppc64

Jim Keniston wrote:
> On Mon, 2007-04-30 at 17:41 +0530, Srinivasa Ds wrote:
>> Results of systemtap-20070428 snapshot on ppc64
>> ==============================================
>> Date: 200704300942
>> User: root
>> Kernel: Linux 2.6.21-rc7 #5 SMP Mon Apr 23 09:48:27 IST 2007 ppc64 ppc64
>> ppc64 GNU/Linux
>>
>> Testsuite summary of failed tests
>> FAIL: systemtap.samples/lket(semantic error)
>> FAIL: sysopen (1)                      (PR 3429)
> 
> Thanks for the report.  Are you sure 3429 is the right bug number?

Iam sorry, That was a typo. Bug number is 4329. I have found out the
reason for the bug and proposed the solution in the bugzilla. So this
might get fixed in  next 2 or 3 days.

  It
> was marked fixed in January.  If this is really the bug you're seeing,
> you should either re-open it and annotate it with a description of what
> you're seeing, or submit a new problem report.
> 
> Jim
> 

nclsfabre at yahoo dot fr | 1 May 2007 11:34
Favicon

[Bug kprobes/2726] systemtap.base/probefunc.exp crash in kernel/module.c:2114 on RHEL4


------- Additional Comments From nclsfabre at yahoo dot fr  2007-05-01 10:34 -------
Hi,
I've got this kernel panic on a server :
Kernel panic - not sync: kernel/module.c:2114:
spin_lock(kernel/module.c:c036b280) already locked by kernel/module.c:2114.

Find here some first informations :

[root <at> localhost ~]# uname -a
Linux localhost.localdomain 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686
i686 i386 GNU/Linux
[root <at> localhost ~]# more /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Pentium(R) 4 CPU 1.60GHz
stepping        : 2
cpu MHz         : 1595.623
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
(Continue reading)

Favicon

[Bug testsuite/4444] New: statfs failure in systemtap.syscalls

In ppc64 we used to see the statfs failure. This is due to differnce in expected
and actual strings.

Systemtap.log shows
==================================
statfs: ustat (42, 0x0000000012345678) = -22 (EINVAL)
statfs: statfs ("abc", 0x0000000012345678) = -2 (ENOENT)
statfs: fstatfs (77, 0x0000000012345678) = -9 (EBADF)
statfs: exit_group (0) =
  statfs: exit (0) =
--------- EXPECTED and NOT MATCHED ----------
statfs: ustat \(42, 0x12345678\) =
statfs: statfs \("abc", 0x12345678\) =
statfs: fstatfs \(77, 0x12345678\) =
======================================

Since address in 64-bit systems are always 8byte for both the 32-bit compiled
and 64 bit compiled executables, we should make sure that for 64-bit systems,
right value is being compared irrespective of 32-bit or 64-bit compilation.

So we should use "defined( __powerpc__) || defined(__x86_64__) " to identify the
64-bit system rather than "__WORDSIZE".

Test case shows 
===========================================

  ustat(42, (struct ustat *)0x12345678);
#if __WORDSIZE == 64
  // ustat (42, 0x0000000012345678) =
#else
(Continue reading)

Srinivasa Ds | 1 May 2007 13:32
Picon

Re: Results of 20070428 snapshot on ppc64

Vara Prasad wrote:
> Hi Srini,
> 
> Thanks for your help in running the weekly tests and posting the results.
> 
> Srinivasa Ds wrote:
> 
>> Results of systemtap-20070428 snapshot on ppc64
>> ==============================================
>> Date: 200704300942
>> User: root
>> Kernel: Linux 2.6.21-rc7 #5 SMP Mon Apr 23 09:48:27 IST 2007 ppc64 ppc64
>> ppc64 GNU/Linux
>>
>> Testsuite summary of failed tests
>> FAIL: systemtap.samples/lket(semantic error)
>> FAIL: sysopen (1)                      (PR 3429)
>> FAIL: 64-bit mmap                      (PR 4088)
>> FAIL: 64-bit readwrite
>> FAIL: 64-bit signal
>> FAIL: 64-bit statfs
>>  
>>
> Do you have a bug no. for the above 64bit failures?

We are investigating on errors.Most of the failures are due to
difference in actual and expected strings(Expected strings suits 32-bit
systems and not 64-bit systems)

For ex:-  in statfs test
(Continue reading)

fche at redhat dot com | 1 May 2007 14:43
Favicon

[Bug testsuite/4329] systemtap.samples/sysopen test fails on several arch's


------- Additional Comments From fche at redhat dot com  2007-05-01 13:43 -------
The test case now expects >=1 instead of >1 opens.
It would be better if the test case included explicit
load generation though.

--

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

http://sourceware.org/bugzilla/show_bug.cgi?id=4329

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

fche at redhat dot com | 1 May 2007 14:51
Favicon

[Bug kprobes/2726] systemtap.base/probefunc.exp crash in kernel/module.c:2114 on RHEL4


------- Additional Comments From fche at redhat dot com  2007-05-01 13:51 -------
> Kernel panic - not sync: kernel/module.c:2114:
> spin_lock(kernel/module.c:c036b280) already locked by kernel/module.c:2114.

This sounds like an old kernel problem related to searching modules for
exception handling data.  The patch for that was in limbo IIRC.  Can someone
find e.g. the LKML discussion?  Google is failing me (gasp!).

> Find here some first informations :
> What other informations do you need to investigate on this bug ?

http://sourceware.org/systemtap/wiki/HowToReportBugs

--

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |

http://sourceware.org/bugzilla/show_bug.cgi?id=2726

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


Gmane