qiaonuohan | 29 Oct 09:30 2014

[PATCH] kdump: fix to get the correct page at the edge of split files

Hello Dave,

I found a bug when analyzing split vmcore in kdump-compressed format.
Please check the patch.

-- 
Regards
Qiao Nuohan
Hello Dave,

I found a bug when analyzing split vmcore in kdump-compressed format.
Please check the patch.

--

-- 
Regards
Qiao Nuohan
Subramanian Karunanithi | 25 Oct 18:36 2014
Picon

Regarding crash-gcore-command

Hi,

I am using crash-gcore-command 1.2.0.
I am trying to cross compile this tool for PPC arch. However, looks like gcore_defs.h is having only x86, x86_64 and ARM capability. 

Is there any plan to support this tool for PPC?

Regards,
Subramanian. K
<div><div dir="ltr">Hi,<div><br></div>
<div>I am using&nbsp;crash-gcore-command 1.2.0.</div>
<div>I am trying to cross compile this tool for PPC arch. However, looks like gcore_defs.h is having only x86, x86_64 and ARM capability.&nbsp;</div>
<div><br></div>
<div>Is there any plan to support this tool for PPC?</div>
<div><br></div>
<div>Regards,</div>
<div>Subramanian. K</div>
</div></div>
HATAYAMA Daisuke | 24 Oct 12:12 2014

[ANNOUNCE] crash gcore command, version 1.3.0-rc2 is released

This is the release of crash gcore command, version 1.3.0-rc2.

The version 1.3.0 is going to newly add ARM64 support, including
compat mode, and PPC64 support, and the purpose of this serise of rc
version releases is for verification by other architecture
maintainers. Please give me a verfication result as a reply to this
mail.

The remaining changes are all bugfixes.

# The changes include those that appeared in v1.3.0-rc.

ChangeLog:

[new features]

 - Add ARM64 support. In addition to native ARM64 build, like crash
   utility, we can build x86_64 executable of crash gcore command for
   ARM64 crash dump by make target=ARM64, just like crash utility.
   (anderson <at> redhat.com)

 - Add ARM64 compat mode support. This allows gcore to create
   corefiles for tasks running in 32-bit compatible mode on ARM64.
   (weishu <at> marvell.com)

 - Add PPC64 support. This includes both big-endian and little-endian
   formats.
   (mtoman <at> redhat.com, anderson <at> redhat.com)

[bugfixes]

 - Correct a read buffer size for NT_FPREGSET as sizeof(struct
   user_i387_struct). So far we had used sizeof(union thread_xstate)
   falsely as a read buffer size but it had accidentally been equal to
   sizeof(struct user_i387_struct). However, the following patch
   extended union thread_xstate and sizeof(union thread_xstate) became
   larger than sizeof(struct user_i387_struct):

    commit e7d820a5e549b3eb6c3f9467507566565646a669
    Author: Qiaowei Ren <qiaowei.ren <at> intel.com>
    Date:   Thu Dec 5 17:15:34 2013 +0800

        x86, xsave: Support eager-only xsave features, add MPX support

        Some features, like Intel MPX, work only if the kernel uses eagerfpu
        model.  So we should force eagerfpu on unless the user has explicitly
        disabled it.

        Add definitions for Intel MPX and add it to the supported list.

        [ hpa: renamed XSTATE_FLEXIBLE to XSTATE_LAZY and added comments ]

        Signed-off-by: Qiaowei Ren <qiaowei.ren <at> intel.com>
        Link: http://lkml.kernel.org/r/9E0BE1322F2F2246BD820DA9FC397ADE014A6115 <at> SHSMSX102.ccr.corp.intel.com
        Signed-off-by: H. Peter Anvin <hpa <at> linux.intel.com>

   Without this patch, for vmcores whose kernel versions are v3.14 or
   later, gcore results in segmentation fault due to a buffer overrite
   of NT_FPREGSET.
   (d.hatayama <at> jp.fujitsu.com)

 - Although ELF_DATA is defined in gcore_defs.h, ELFDATA2LSB is used
   directly at elf{64,32}_fill_elf_header(). There's so far been no
   problem since the exisitng supported architectures are all
   little-endian systems. Fix this to support PPC64 that uses
   little-endian format.
   (anderson <at> redhat.com)

 - Fix a bug that registers in NT_PRSTATUS note information is
   broken. This had been since v1.2.2 when O(1) note informaiton
   collection was added. Without this fix, we can never get reliable
   register values for failure analysis.
   (weishu <at> marvell.com)

 - Fix a bug that NT_386_IOPERM note information is not collected. So
   far, ioperm_get() had always returned 1. As a result, NT_386_IOPERM
   note information had never been not included in a generated core
   file even if it is available for a given task on a given crash
   dump.
   (d.hatayama <at> jp.fujitsu.com)

 - Add new member offset initialization for struct
   nsproxy::pid_ns_for_children. In upstream, the following patch
   renamed struct nsproxy::pid_ns into struct
   nsproxy::pid_ns_for_children.

    $ git log -1 c2b1df2e
    commit c2b1df2eb42978073ec27c99cc199d20ae48b849
    Author: Andy Lutomirski <luto <at> amacapital.net>
    Date:   Thu Aug 22 11:39:16 2013 -0700

        Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children

        nsproxy.pid_ns is *not* the task's pid namespace. The name
        should clarify that.

        This makes it more obvious that setns on a pid namespace is weird --
        it won't change the pid namespace shown in procfs.

        Signed-off-by: Andy Lutomirski <luto <at> amacapital.net>
        Reviewed-by: "Eric W. Biederman" <ebiederm <at> xmission.com>
        Signed-off-by: David S. Miller <davem <at> davemloft.net>

   Without this fix, gcore exited abnormally at its initialization
   part and so core file is never generated.
   (d.hatayama <at> jp.fujitsu.com)

 - Fix a bug that a wrong way of checking return value of
   fopen(). fopen() returns NULL in case of error, but gcore had seen
   it as returning a minus integer. As a result, gcore continues
   execution after the check even in case of error and then exits
   abnormally at the first call of fwrite() with the broken file
   pointer gcore failed to open.

   From users' viewpoint, we face this bug when trying to overwrite an
   existing corefile with more priviledged permission and resulting in
   EPERM failure.
   (d.hatayama <at> jp.fujitsu.com)

MD5 CheckSum:

$ md5sum ./crash-gcore-command-1.3.0-rc2.tar.gz
07757d2ee044b19cac6b652de0d757fc  ./crash-gcore-command-1.3.0-rc2.tar.gz

--
Thanks.
HATAYAMA, Daisuke
Attachment (crash-gcore-command-1.3.0-rc2.tar.gz): application/octet-stream, 79 KiB
This is the release of crash gcore command, version 1.3.0-rc2.

The version 1.3.0 is going to newly add ARM64 support, including
compat mode, and PPC64 support, and the purpose of this serise of rc
version releases is for verification by other architecture
maintainers. Please give me a verfication result as a reply to this
mail.

The remaining changes are all bugfixes.

# The changes include those that appeared in v1.3.0-rc.

ChangeLog:

[new features]

 - Add ARM64 support. In addition to native ARM64 build, like crash
   utility, we can build x86_64 executable of crash gcore command for
   ARM64 crash dump by make target=ARM64, just like crash utility.
   (anderson <at> redhat.com)

 - Add ARM64 compat mode support. This allows gcore to create
   corefiles for tasks running in 32-bit compatible mode on ARM64.
   (weishu <at> marvell.com)

 - Add PPC64 support. This includes both big-endian and little-endian
   formats.
   (mtoman <at> redhat.com, anderson <at> redhat.com)

[bugfixes]

 - Correct a read buffer size for NT_FPREGSET as sizeof(struct
   user_i387_struct). So far we had used sizeof(union thread_xstate)
   falsely as a read buffer size but it had accidentally been equal to
   sizeof(struct user_i387_struct). However, the following patch
   extended union thread_xstate and sizeof(union thread_xstate) became
   larger than sizeof(struct user_i387_struct):

    commit e7d820a5e549b3eb6c3f9467507566565646a669
    Author: Qiaowei Ren <qiaowei.ren <at> intel.com>
    Date:   Thu Dec 5 17:15:34 2013 +0800

        x86, xsave: Support eager-only xsave features, add MPX support

        Some features, like Intel MPX, work only if the kernel uses eagerfpu
        model.  So we should force eagerfpu on unless the user has explicitly
        disabled it.

        Add definitions for Intel MPX and add it to the supported list.

        [ hpa: renamed XSTATE_FLEXIBLE to XSTATE_LAZY and added comments ]

        Signed-off-by: Qiaowei Ren <qiaowei.ren <at> intel.com>
        Link: http://lkml.kernel.org/r/9E0BE1322F2F2246BD820DA9FC397ADE014A6115 <at> SHSMSX102.ccr.corp.intel.com
        Signed-off-by: H. Peter Anvin <hpa <at> linux.intel.com>

   Without this patch, for vmcores whose kernel versions are v3.14 or
   later, gcore results in segmentation fault due to a buffer overrite
   of NT_FPREGSET.
   (d.hatayama <at> jp.fujitsu.com)

 - Although ELF_DATA is defined in gcore_defs.h, ELFDATA2LSB is used
   directly at elf{64,32}_fill_elf_header(). There's so far been no
   problem since the exisitng supported architectures are all
   little-endian systems. Fix this to support PPC64 that uses
   little-endian format.
   (anderson <at> redhat.com)

 - Fix a bug that registers in NT_PRSTATUS note information is
   broken. This had been since v1.2.2 when O(1) note informaiton
   collection was added. Without this fix, we can never get reliable
   register values for failure analysis.
   (weishu <at> marvell.com)

 - Fix a bug that NT_386_IOPERM note information is not collected. So
   far, ioperm_get() had always returned 1. As a result, NT_386_IOPERM
   note information had never been not included in a generated core
   file even if it is available for a given task on a given crash
   dump.
   (d.hatayama <at> jp.fujitsu.com)

 - Add new member offset initialization for struct
   nsproxy::pid_ns_for_children. In upstream, the following patch
   renamed struct nsproxy::pid_ns into struct
   nsproxy::pid_ns_for_children.

    $ git log -1 c2b1df2e
    commit c2b1df2eb42978073ec27c99cc199d20ae48b849
    Author: Andy Lutomirski <luto <at> amacapital.net>
    Date:   Thu Aug 22 11:39:16 2013 -0700

        Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children

        nsproxy.pid_ns is *not* the task's pid namespace. The name
        should clarify that.

        This makes it more obvious that setns on a pid namespace is weird --
        it won't change the pid namespace shown in procfs.

        Signed-off-by: Andy Lutomirski <luto <at> amacapital.net>
        Reviewed-by: "Eric W. Biederman" <ebiederm <at> xmission.com>
        Signed-off-by: David S. Miller <davem <at> davemloft.net>

   Without this fix, gcore exited abnormally at its initialization
   part and so core file is never generated.
   (d.hatayama <at> jp.fujitsu.com)

 - Fix a bug that a wrong way of checking return value of
   fopen(). fopen() returns NULL in case of error, but gcore had seen
   it as returning a minus integer. As a result, gcore continues
   execution after the check even in case of error and then exits
   abnormally at the first call of fwrite() with the broken file
   pointer gcore failed to open.

   From users' viewpoint, we face this bug when trying to overwrite an
   existing corefile with more priviledged permission and resulting in
   EPERM failure.
   (d.hatayama <at> jp.fujitsu.com)

MD5 CheckSum:

$ md5sum ./crash-gcore-command-1.3.0-rc2.tar.gz
07757d2ee044b19cac6b652de0d757fc  ./crash-gcore-command-1.3.0-rc2.tar.gz

--
Thanks.
HATAYAMA, Daisuke
Petr Tesarik | 24 Oct 11:09 2014
Picon

Kernel dump file access library

Hi all,

during this year's SUSE HackWeek, my colleague started work on enabling
kernel core files in gdb. I realized that there would be at least four
different programs implementing read access to kernel dump files:

  1. the crash utility
  2. makedumpfile (when re-filtering)
  3. kdumpid (my project to get kernel version from a dump file)
  4. gdb-kdump (started by my colleague during HackWeek)

At this point, I felt that's too much re-inventing the wheel again and
again, so I took my current code from kdumpid and adapted it as a
library that can be used by everybody:

    https://github.com/ptesarik/libkdumpfile

In its current shape, it's usable, but far from complete.

Things that work already:
   - identify kdump file format
   - parsed meta-information from the header
   - open ELF, diskdump, makedumpfile, LKCD
   - read data by physical address (incl. Xen Dom0)
   - read data by Xen machine address

Things still on my TODO list:
   - more formats: sadump, kvmdump, libvirt, xc_core, xc_save
   - determine phys_base in ELF files
   - determine kernel release if not found in headers

Ideally, I would like to replace all current implementations with this
library, so if a new file format appears, or a new feature is added to
one of the files, it can be immediately used by all kdump-related tools.

Please let me know what you think.
Oh, and if you're developing such a tool, let me know which features
should be added.

Regards,
Petr Tesarik

"Zhou, Wenjian/周文剑" | 22 Oct 08:32 2014

add support for incomplete elf dump file

Since the incomplete dump file generated by ENOSPC error can't be analysed
by crash utility, but sometimes this file may contain important information
and the panic problem won't be reproduced, then we came up with an idea to
modify the exist data of the incomplete dump file to make it analysable by
crash utility.
However, we found it will be more suitable to check the incomplete data than
modifying it in make dump file.
So, we change the p_filesz of PT_LOAD header, zero_fill and phys_end of PT_LOAD
segments, to make crash can analyse incomplete ELF dump file, when incomplete
flag exists.

the issue was discussed at
	http://lists.infradead.org/pipermail/kexec/2014-October/012669.html

--- a/netdump.c
+++ b/netdump.c
 <at>  <at>  -52,6 +52,7  <at>  <at>  static char *vmcoreinfo_read_string(const char *);

  #define MIN_PAGE_SIZE (4096)

+#define DUMP_ELF_INCOMPLETE 0x1
  /*
   * Architectures that have configurable page sizes,
   * can differ from the host machine's page size.
 <at>  <at>  -488,6 +489,10  <at>  <at>  check_dumpfile_size(char *file)
         if (stat64(file, &stat) < 0)
                 return;

+       Elf64_Phdr *load64 = nd->load64;
+       Elf32_Phdr *load32 = nd->load32;
+       unsigned int e_flag = (NULL == nd->elf64) ? (nd->elf32)->e_flags : (nd->elf64)->e_flags;
+       int status = e_flag & DUMP_ELF_INCOMPLETE;
         for (i = 0; i < nd->num_pt_load_segments; i++) {
                 pls = &nd->pt_load_segments[i];

 <at>  <at>  -495,7 +500,19  <at>  <at>  check_dumpfile_size(char *file)
                         (pls->phys_end - pls->phys_start);

                 if (segment_end > stat.st_size) {
-                       error(WARNING, "%s: may be truncated or incomplete\n"
+                       if (!status){
+                               error(WARNING, "%s: may be truncated\n"
+                                       "         PT_LOAD p_offset: %lld\n"
+                                       "                 p_filesz: %lld\n"
+                                       "           bytes required: %lld\n"
+                                       "            dumpfile size: %lld\n\n",
+                                       file, pls->file_offset,
+                                       pls->phys_end - pls->phys_start,
+                                       segment_end, stat.st_size);
+                               return;
+                       }
+                       else{
+                               error(WARNING, "%s: may be incomplete\n"
                                 "         PT_LOAD p_offset: %lld\n"
                                 "                 p_filesz: %lld\n"
                                 "           bytes required: %lld\n"
 <at>  <at>  -503,8 +520,25  <at>  <at>  check_dumpfile_size(char *file)
                                 file, pls->file_offset,
                                 pls->phys_end - pls->phys_start,
                                 segment_end, stat.st_size);
-                       return;
+                       }
+                       if (pls->file_offset > stat.st_size){
+                                pls->file_offset = 0;
+                                pls->phys_start = 0;
+                                pls->phys_end = 0;
+                        }
+                        else {
+                               if (NULL == load32)
+                                       load64->p_filesz = stat.st_size - pls->file_offset;
+                               else
+                                       load32->p_filesz = stat.st_size - pls->file_offset;
+                               pls->zero_fill = pls->phys_end;
+                                pls->phys_end = stat.st_size - pls->file_offset + pls->phys_start;
+                       }
                 }
+               if (NULL == load32)
+                       load64++;
+               else
+                       load32++;
         }
  }

Karlsson, Jan | 21 Oct 12:46 2014

Crash in crash

Hi Dave

 

I have a vmcore file for ARM64 that crashes Crash during startup. The core file is created at a hardware watchdog (I believe) so there is no panic message or something similar in the log.

 

This is the printout from Crash running under gdb, after the copyrights and config information:

 

please wait... (determining panic task)        

Program received signal SIGSEGV, Segmentation fault.

0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114

4114                                              if (tgid == last->tgid) {

 

(gdb) bt

#0  0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114

#1  0x000000000047f046 in get_task_mem_usage (task=18446743799318107136, tm=0x7fffffff6f40)

    at memory.c:4186

#2  0x000000000047c679 in vm_area_dump (task=18446743799318107136, flag=10, vaddr=0, ref=0x0)

    at memory.c:3671

#3  0x000000000047ec08 in in_user_stack (task=18446743799318107136, vaddr=0) at memory.c:4063

#4  0x00000000004fd9fe in arm64_get_dumpfile_stackframe (frame=<synthetic pointer>,

    bt=<optimized out>) at arm64.c:1077

#5  arm64_get_stack_frame (bt=0x7fffffffc690, pcp=0x7fffffff9560, spp=0x7fffffff9568)

    at arm64.c:1103

#6  0x00000000004de409 in back_trace (bt=0x7fffffffc690) at kernel.c:2533

#7  0x00000000004d1563 in foreach (fd=0x7fffffffc7c0) at task.c:6161

#8  0x00000000004d2bbd in panic_search () at task.c:6425

#9  0x00000000004d4454 in get_panic_context () at task.c:5364

#10 task_init () at task.c:491

#11 0x000000000046146e in main_loop () at main.c:801

#12 0x00000000006467a3 in captured_command_loop (data=<optimized out>) at main.c:258

#13 0x000000000064535b in catch_errors (func=0x646790 <captured_command_loop>, func_args=0x0,

    errstring=0x873235 "", mask=6) at exceptions.c:557

#14 0x0000000000647726 in captured_main (data=<optimized out>) at main.c:1064

#15 0x000000000064535b in catch_errors (func=0x646aa0 <captured_main>, func_args=0x7fffffffe030,

    errstring=0x873235 "", mask=6) at exceptions.c:557

#16 0x0000000000647a84 in gdb_main (args=<optimized out>) at main.c:1079

#17 0x0000000000647abe in gdb_main_entry (argc=<optimized out>, argv=<optimized out>)

    at main.c:1099

#18 0x000000000045f61f in main (argc=3, argv=0x7fffffffe188) at main.c:758

 

(gdb) p tt->last_tgid

$1 = (struct tgid_context *) 0x0

 

Source code for tgid_quick_search:

static struct tgid_context *

tgid_quick_search(ulong tgid)

{

                           struct tgid_context *last, *next;

 

                           tt->tgid_searches++;

 

                           last = tt->last_tgid;

                           if (tgid == last->tgid) {

                                                       tt->tgid_cache_hits++;

                                                       return last;

                           }

  ....

}

 

So 'last' becomes 0 which causes the crash.

 

After some more investigation I have seen that "tt->last_tgid" is initialized in function sort_tgid_array in task.c, but that function seems to be called at a later stage.

 

By adding a line in tgid_quick_search:

 

static struct tgid_context *

tgid_quick_search(ulong tgid)

{

                           struct tgid_context *last, *next;

 

                           tt->tgid_searches++;

 

                           if (tt->last_tgid == 0) sort_tgid_array(); // added line

                           last = tt->last_tgid;

                           if (tgid == last->tgid) {

                                                       tt->tgid_cache_hits++;

                                                       return last;

                           }

  ...

 

I can run Crash on this core file. However I do not know if this is the best way to fix the problem.

 

Jan

 

Jan Karlsson

Senior Software Engineer

System Assurance

 

Sony Mobile Communications

Tel: +46 703 062 174

jan.karlsson <at> sonymobile.com

 

sonymobile.com

 

 

<div><div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi Dave<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">I have a vmcore file for ARM64 that crashes Crash during startup. The core file is created at a hardware watchdog (I believe) so there is no panic message or something similar in the log.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">This is the printout from Crash running under gdb, after the copyrights and config information:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">please wait... (determining panic task)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Program received signal SIGSEGV, Segmentation fault.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">4114&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (tgid == last-&gt;tgid) {<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) bt<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#0&nbsp; 0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#1&nbsp; 0x000000000047f046 in get_task_mem_usage (task=18446743799318107136, tm=0x7fffffff6f40)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp; at memory.c:4186<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#2&nbsp; 0x000000000047c679 in vm_area_dump (task=18446743799318107136, flag=10, vaddr=0, ref=0x0)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp; at memory.c:3671<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#3&nbsp; 0x000000000047ec08 in in_user_stack (task=18446743799318107136, vaddr=0) at memory.c:4063<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#4&nbsp; 0x00000000004fd9fe in arm64_get_dumpfile_stackframe (frame=&lt;synthetic pointer&gt;, <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;bt=&lt;optimized out&gt;) at arm64.c:1077<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#5&nbsp; arm64_get_stack_frame (bt=0x7fffffffc690, pcp=0x7fffffff9560, spp=0x7fffffff9568)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp; at arm64.c:1103<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#6&nbsp; 0x00000000004de409 in back_trace (bt=0x7fffffffc690) at kernel.c:2533<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#7&nbsp; 0x00000000004d1563 in foreach (fd=0x7fffffffc7c0) at task.c:6161<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#8&nbsp; 0x00000000004d2bbd in panic_search () at task.c:6425<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#9&nbsp; 0x00000000004d4454 in get_panic_context () at task.c:5364<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#10 task_init () at task.c:491<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#11 0x000000000046146e in main_loop () at main.c:801<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#12 0x00000000006467a3 in captured_command_loop (data=&lt;optimized out&gt;) at main.c:258<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#13 0x000000000064535b in catch_errors (func=0x646790 &lt;captured_command_loop&gt;, func_args=0x0, <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;errstring=0x873235 "", mask=6) at exceptions.c:557<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#14 0x0000000000647726 in captured_main (data=&lt;optimized out&gt;) at main.c:1064<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#15 0x000000000064535b in catch_errors (func=0x646aa0 &lt;captured_main&gt;, func_args=0x7fffffffe030, <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;errstring=0x873235 "", mask=6) at exceptions.c:557<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#16 0x0000000000647a84 in gdb_main (args=&lt;optimized out&gt;) at main.c:1079<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#17 0x0000000000647abe in gdb_main_entry (argc=&lt;optimized out&gt;, argv=&lt;optimized out&gt;)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp; at main.c:1099<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">#18 0x000000000045f61f in main (argc=3, argv=0x7fffffffe188) at main.c:758<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) p tt-&gt;last_tgid<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">$1 = (struct tgid_context *) 0x0<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Source code for tgid_quick_search:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">static struct tgid_context *<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">tgid_quick_search(ulong tgid)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">{<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; struct tgid_context *last, *next;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; tt-&gt;tgid_searches++;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last = tt-&gt;last_tgid;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (tgid == last-&gt;tgid) {<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; tt-&gt;tgid_cache_hits++;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>return last;<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<p></p></p>
<p class="MsoNormal">&nbsp; ....<p></p></p>
<p class="MsoNormal"><span lang="EN-US">}<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">So 'last' becomes 0 which causes the crash.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">After some more investigation I have seen that "tt-&gt;last_tgid" is initialized in function sort_tgid_array in task.c, but that function seems to be called at a later stage.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">By adding a line in tgid_quick_search:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">static struct tgid_context *<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">tgid_quick_search(ulong tgid)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">{<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; struct tgid_context *last, *next;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; tt-&gt;tgid_searches++;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (tt-&gt;last_tgid == 0) sort_tgid_array(); // added line<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last = tt-&gt;last_tgid;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (tgid == last-&gt;tgid) {<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; tt-&gt;tgid_cache_hits++;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return last;<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp; ...<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">I can run Crash on this core file. However I do not know if this is the best way to fix the problem.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan Karlsson</span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Senior Software Engineer<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">System Assurance<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Sony Mobile Communications<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Tel: +46 703 062 174<p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><a href="mailto:Firstname.Lastname <at> sonymobile.com"><span>jan.karlsson <at> sonymobile.com</span></a><p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><a href="http://sonymobile.com/"><span lang="EN-US">sonymobile.com</span></a></span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span></span><span><p></p></span></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div></div>
Dave Anderson | 17 Oct 17:12 2014
Picon

[PATCH] crash-gcore-command extension module: PPC64 support


Hello Daisuke,

Attached is a patch that introduces support for the PPC64 architecture.
The patch was written by Michal Toman (mtoman <at> redhat.com).  It is based
upon crash-gcore-command-1.3.0-rc.

The patch supports both big-endian and little-endian formats.  However,
it does require the ELF_DATA fix to elf64_fill_elf_header() that I reported
yesterday.  I have attached a separate patch to fix elf64_fill_elf_header
and elf32_fill_elf_header().

Please include these two patches in crash-gcore-command-1.3.0.

Thanks,
  Dave
Attachment (ELF_DATA.patch): text/x-patch, 1300 bytes
Attachment (add-ppc64-v5.patch): text/x-patch, 8 KiB

Hello Daisuke,

Attached is a patch that introduces support for the PPC64 architecture.
The patch was written by Michal Toman (mtoman <at> redhat.com).  It is based
upon crash-gcore-command-1.3.0-rc.

The patch supports both big-endian and little-endian formats.  However,
it does require the ELF_DATA fix to elf64_fill_elf_header() that I reported
yesterday.  I have attached a separate patch to fix elf64_fill_elf_header
and elf32_fill_elf_header().

Please include these two patches in crash-gcore-command-1.3.0.

Thanks,
  Dave
Karlsson, Jan | 15 Oct 11:19 2014

FW: Number of cpus on ARM

Hi

 

Unfortunately I found another older example where my patch below did not work.

In that one only cpu 0 where online but 0,1,2,3 where active. So maybe:

 

                           return MAX(get_cpus_active(), get_highest_cpu_online()+1);

 

might work better. Someone with better knowledge about this than I have should look at the problem.

 

Jan

 

Jan Karlsson

Senior Software Engineer

System Assurance

 

Sony Mobile Communications

Tel: +46 703 062 174

jan.karlsson <at> sonymobile.com

 

sonymobile.com

 

 

From: Karlsson, Jan
Sent: den 15 oktober 2014 10:49
To: Discussion list for crash utility usage, maintenance and development
Subject:

 

Hi

 

I have seen a problem when it comes to the number of cpus for ARM (32-bits).

 

static int

arm_get_smp_cpus(void)

{

                           return MAX(get_cpus_active(), get_cpus_online());

}

 

In one of my example, “help –k” gives me:

       cpu_possible_map: 0 1 2 3

        cpu_present_map: 0 1 2 3

         cpu_online_map: 0 3

         cpu_active_map: 3

 

So the number of cpus will become 2. However there are code in a number of places that will then only accept cpu 0 and 1 as cpus to handle.

 

When I changed to code to be the same as for ARM64 things worked as expected:

 

static int

arm_get_smp_cpus(void)

{

                           return MAX(get_cpus_online(), get_highest_cpu_online()+1);

}

 

Jan

 

Jan Karlsson

Senior Software Engineer

System Assurance

 

Sony Mobile Communications

Tel: +46 703 062 174

jan.karlsson <at> sonymobile.com

 

sonymobile.com

 

 

<div><div class="WordSection1">
<p class="MsoNormal"><span>Hi<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Unfortunately I found another older example where my patch below did not work.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">In that one only cpu 0 where online but 0,1,2,3 where active. So maybe:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return MAX(get_cpus_active(), get_highest_cpu_online()+1);<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">might work better. Someone with better knowledge about this than I have should look at the problem.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US">Jan Karlsson</span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Senior Software Engineer<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">System Assurance<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Sony Mobile Communications<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Tel: +46 703 062 174<p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><a href="mailto:Firstname.Lastname <at> sonymobile.com">jan.karlsson <at> sonymobile.com</a><p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><a href="http://sonymobile.com/"><span>sonymobile.com</span></a></span><span><p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span></span><span><p></p></span></p>
</div>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<div><div><p class="MsoNormal"><span lang="EN-US">From:</span><span lang="EN-US"> Karlsson, Jan <br>Sent: den 15 oktober 2014 10:49<br>To: Discussion list for crash utility usage, maintenance and development<br>Subject: <p></p></span></p></div></div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal"><span lang="EN-US">Hi<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">I have seen a problem when it comes to the number of cpus for ARM (32-bits).<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">static int<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">arm_get_smp_cpus(void)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">{<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return MAX(get_cpus_active(), get_cpus_online());<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">}<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">In one of my example, &ldquo;help &ndash;k&rdquo; gives me:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cpu_possible_map: 0 1 2 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_present_map: 0 1 2 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_online_map: 0 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_active_map: 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">So the number of cpus will become 2. However there are code in a number of places that will then only accept cpu 0 and 1 as cpus to handle.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">When I changed to code to be the same as for ARM64 things worked as expected:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">static int<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">arm_get_smp_cpus(void)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">{<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return MAX(get_cpus_online(), get_highest_cpu_online()+1);<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">}<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan Karlsson</span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Senior Software Engineer<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">System Assurance<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Sony Mobile Communications<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Tel: +46 703 062 174<p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><a href="mailto:Firstname.Lastname <at> sonymobile.com">jan.karlsson <at> sonymobile.com</a><p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><a href="http://sonymobile.com/"><span lang="EN-US">sonymobile.com</span></a></span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span></span><span><p></p></span></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div></div>
Karlsson, Jan | 15 Oct 10:49 2014

(no subject)

Hi

 

I have seen a problem when it comes to the number of cpus for ARM (32-bits).

 

static int

arm_get_smp_cpus(void)

{

                           return MAX(get_cpus_active(), get_cpus_online());

}

 

In one of my example, “help –k” gives me:

       cpu_possible_map: 0 1 2 3

        cpu_present_map: 0 1 2 3

         cpu_online_map: 0 3

         cpu_active_map: 3

 

So the number of cpus will become 2. However there are code in a number of places that will then only accept cpu 0 and 1 as cpus to handle.

 

When I changed to code to be the same as for ARM64 things worked as expected:

 

static int

arm_get_smp_cpus(void)

{

                           return MAX(get_cpus_online(), get_highest_cpu_online()+1);

}

 

Jan

 

Jan Karlsson

Senior Software Engineer

System Assurance

 

Sony Mobile Communications

Tel: +46 703 062 174

jan.karlsson <at> sonymobile.com

 

sonymobile.com

 

 

<div><div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">I have seen a problem when it comes to the number of cpus for ARM (32-bits).<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">static int<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">arm_get_smp_cpus(void)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">{<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return MAX(get_cpus_active(), get_cpus_online());<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">}<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">In one of my example, &ldquo;help &ndash;k&rdquo; gives me:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cpu_possible_map: 0 1 2 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_present_map: 0 1 2 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_online_map: 0 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_active_map: 3 <p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">So the number of cpus will become 2. However there are code in a number of places that will then only accept cpu 0 and 1 as cpus to handle.<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">When I changed to code to be the same as for ARM64 things worked as expected:<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">static int<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">arm_get_smp_cpus(void)<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">{<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return MAX(get_cpus_online(), get_highest_cpu_online()+1);<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">}<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jan Karlsson</span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Senior Software Engineer<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">System Assurance<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="EN-US">Sony Mobile Communications<p></p></span></p>
<p class="MsoNormal"><span lang="EN-US">Tel: +46 703 062 174<p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><a href="mailto:Firstname.Lastname <at> sonymobile.com"><span>jan.karlsson <at> sonymobile.com</span></a><p></p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><a href="http://sonymobile.com/"><span lang="EN-US">sonymobile.com</span></a></span><span lang="EN-US"><p></p></span></p>
<p class="MsoNormal"><span lang="EN-US"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span></span><span><p></p></span></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div></div>
Dave Anderson | 14 Oct 22:37 2014
Picon

HEADS UP: crash will fail to initialize with upstream CONFIG_SLAB kernels


Just a heads-up to those who may be running with bleeding-edge upstream
kernels that have this commit:

   commit bf0dea23a9c094ae869a88bb694fbe966671bf6d
   Author: Joonsoo Kim <iamjoonsoo.kim <at> lge.com>
   Date: Thu Oct 9 15:26:27 2014 -0700

   mm/slab: use percpu allocator for cpu cache

This change will cause the crash session to fail during initialization
if the target kernel is configured with CONFIG_SLAB.  I haven't tried
it, but it looks like it would fail with an invalid structure member
offset message w/respect to "kmem_cache_s_array".  To work around it, 
you could try using the --no_kmem_cache command line option.

Since RHEL and Fedora use CONFIG_SLUB, it wouldn't be an issue with
those distributions.

Dave

[ANNOUNCE] crash gcore command, version 1.3.0-rc is released

This is the release of crash gcore command, version 1.3.0-rc.

The version 1.3.0 is going to newly add ARM64 support, and the
purpose of this rc version release is for verification by other
architecture maintainers. Please give me a verfication result as a
reply to this mail.

The remaining changes are all bugfixes.

ChangeLog:

 - Add ARM64 support. In addition to native ARM64 build, like crash
   utility, we can build x86_64 executable of crash gcore command for
   ARM64 crash dump by make target=ARM64, just like crash utility.
   (anderson <at> redhat.com)

 - Fix a bug that registers in NT_PRSTATUS note information is
   broken. This had been since v1.2.2 when O(1) note informaiton
   collection was added. Without this fix, we can never get reliable
   register values for failure analysis.
   (weishu <at> marvell.com)

 - Fix a bug that NT_386_IOPERM note information is not collected. So
   far, ioperm_get() had always returned 1. As a result, NT_386_IOPERM
   note information had never been not included in a generated core
   file even if it is available for a given task on a given crash
   dump.
   (d.hatayama <at> jp.fujitsu.com)

 - Add new member offset initialization for struct
   nsproxy::pid_ns_for_children. In upstream, the following patch
   renamed struct nsproxy::pid_ns into struct
   nsproxy::pid_ns_for_children.

    $ git log -1 c2b1df2e
    commit c2b1df2eb42978073ec27c99cc199d20ae48b849
    Author: Andy Lutomirski <luto <at> amacapital.net>
    Date:   Thu Aug 22 11:39:16 2013 -0700

        Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children

        nsproxy.pid_ns is *not* the task's pid namespace.  The name
should clarify
        that.

        This makes it more obvious that setns on a pid namespace is weird --
        it won't change the pid namespace shown in procfs.

        Signed-off-by: Andy Lutomirski <luto <at> amacapital.net>
        Reviewed-by: "Eric W. Biederman" <ebiederm <at> xmission.com>
        Signed-off-by: David S. Miller <davem <at> davemloft.net>

   Without this fix, gcore exited abnormally at its initialization
   part and so core file is never generated.
   (d.hatayama <at> jp.fujitsu.com)

 - Fix a bug that a wrong way of checking return value of
   fopen(). fopen() returns NULL in case of error, but gcore had seen
   it as returning a minus integer. As a result, gcore continues
   execution after the check even in case of error and then exits
   abnormally at the first call of fwrite() with the broken file
   pointer gcore failed to open.

   From users' viewpoint, we face this bug when trying to overwrite an
   existing corefile with more priviledged permission and resulting in
   EPERM failure.

   (d.hatayama <at> jp.fujitsu.com)

MD5 CheckSum:

$ md5sum ./crash-gcore-command-1.3.0-rc.tar.gz
0b841985c084e790966800edfd1b5d43  ./crash-gcore-command-1.3.0-rc.tar.gz

--
Thanks.
HATAYAMA, Daisuke
This is the release of crash gcore command, version 1.3.0-rc.

The version 1.3.0 is going to newly add ARM64 support, and the
purpose of this rc version release is for verification by other
architecture maintainers. Please give me a verfication result as a
reply to this mail.

The remaining changes are all bugfixes.

ChangeLog:

 - Add ARM64 support. In addition to native ARM64 build, like crash
   utility, we can build x86_64 executable of crash gcore command for
   ARM64 crash dump by make target=ARM64, just like crash utility.
   (anderson <at> redhat.com)

 - Fix a bug that registers in NT_PRSTATUS note information is
   broken. This had been since v1.2.2 when O(1) note informaiton
   collection was added. Without this fix, we can never get reliable
   register values for failure analysis.
   (weishu <at> marvell.com)

 - Fix a bug that NT_386_IOPERM note information is not collected. So
   far, ioperm_get() had always returned 1. As a result, NT_386_IOPERM
   note information had never been not included in a generated core
   file even if it is available for a given task on a given crash
   dump.
   (d.hatayama <at> jp.fujitsu.com)

 - Add new member offset initialization for struct
   nsproxy::pid_ns_for_children. In upstream, the following patch
   renamed struct nsproxy::pid_ns into struct
   nsproxy::pid_ns_for_children.

    $ git log -1 c2b1df2e
    commit c2b1df2eb42978073ec27c99cc199d20ae48b849
    Author: Andy Lutomirski <luto <at> amacapital.net>
    Date:   Thu Aug 22 11:39:16 2013 -0700

        Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children

        nsproxy.pid_ns is *not* the task's pid namespace.  The name
should clarify
        that.

        This makes it more obvious that setns on a pid namespace is weird --
        it won't change the pid namespace shown in procfs.

        Signed-off-by: Andy Lutomirski <luto <at> amacapital.net>
        Reviewed-by: "Eric W. Biederman" <ebiederm <at> xmission.com>
        Signed-off-by: David S. Miller <davem <at> davemloft.net>

   Without this fix, gcore exited abnormally at its initialization
   part and so core file is never generated.
   (d.hatayama <at> jp.fujitsu.com)

 - Fix a bug that a wrong way of checking return value of
   fopen(). fopen() returns NULL in case of error, but gcore had seen
   it as returning a minus integer. As a result, gcore continues
   execution after the check even in case of error and then exits
   abnormally at the first call of fwrite() with the broken file
   pointer gcore failed to open.

   From users' viewpoint, we face this bug when trying to overwrite an
   existing corefile with more priviledged permission and resulting in
   EPERM failure.

   (d.hatayama <at> jp.fujitsu.com)

MD5 CheckSum:

$ md5sum ./crash-gcore-command-1.3.0-rc.tar.gz
0b841985c084e790966800edfd1b5d43  ./crash-gcore-command-1.3.0-rc.tar.gz

--
Thanks.
HATAYAMA, Daisuke

Gmane