Jun Koi | 1 Oct 2008 05:12
Picon

Re: question on some command params

On Tue, Sep 30, 2008 at 10:47 PM, Dave Anderson <anderson <at> redhat.com> wrote:
> Dave Anderson wrote:
>>
>> Jun Koi wrote:
>>
>>> On Tue, Sep 23, 2008 at 12:49 AM, Dave Anderson <anderson <at> redhat.com>
>>> wrote:
>>>
>>>> Jun Koi wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I found below cmdline params having no documentation anywhere, so
>>>>> could somebody explain their meaning?
>>>>>
>>>>> - memory_module
>>>>> - no_modules
>>>>> - no_ikconfig
>>>>> - no_namelist_gzip
>>>>> - no_kmem_cache
>>>>> - kmem_cache_delay
>>>>> - readnow
>>>>> - buildinfo
>>>>> - zero_excluded
>>>>>
>>>>>
>>>>> Many thanks,
>>>>> J
>>>>
>>>>
(Continue reading)

Jun Koi | 1 Oct 2008 08:16
Picon

crash versioning?

Hi,

I notice that the way Dave name crash version is a bit special (never
seen anywhere for me) : 4.0-7.1, 4.0-7.2, .... What is the point of
naming versions that way??

Thanks,
J

Dave Anderson | 1 Oct 2008 14:54
Picon
Favicon

Re: question on some command params

Jun Koi wrote:
> Great, it is clear to me now!
> 
> I have another question: what is the purpose of the "-L" option?
> 
> Thanks,
> Jun

It tries to lock all of the crash utility's mapped pages into
memory and prevents them from being paged out during the crash
session.  (man mlockall)

                 case 'L':
                         if (mlockall(MCL_CURRENT|MCL_FUTURE) == -1)
                                 perror("mlockall");
                         break;

It's fairly useless unless perhaps you are debugging a live
system and don't want any paging-out activity of the crash
session itself to interfere with whatever you might be looking
at on the live system.  Or you may just want better response
during live system analysis on a heavily-loaded system.
I don't recall what I was doing that led me to adding it.
It's a debug leftover that should not be used unless you
really have a need for it.

Dave

Dave Anderson | 1 Oct 2008 15:01
Picon
Favicon

Re: crash versioning?

Jun Koi wrote:
> Hi,
> 
> I notice that the way Dave name crash version is a bit special (never
> seen anywhere for me) : 4.0-7.1, 4.0-7.2, .... What is the point of
> naming versions that way??
> 
> Thanks,
> J

It's just rpm's n-v-r convention.

It does get driven somewhat by the Red Hat internal packaging
conventions/requirements.  But don't look for any particular
rhyme or reason.

Worth, Kevin | 1 Oct 2008 21:19
Picon
Favicon

"cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Hello kexec and crash mailing lists,

 

Sorry to spam whoever’s code this ISN’T an issue with, but I really am unsure of whether is a kdump or a crash issue. I am running an Ubuntu 7.04 with a 2.6.20 kernel (includes Ubuntus patches- source at http://packages.ubuntu.com/feisty/linux-source-2.6.20 ) and a modified VMSPLIT/PAGE_OFFSET value (see bottom for details) on an i386 machine with 4GB of memory. At first I thought this could be an issue with makedumpfile stripping out things it shouldn’t, but I’ve found that setting up my initrd script so that it simply performs “cp /proc/vmcore /var/crash/vmcore” results in the same issue.

 

I’ve tried this with both crash 4.0-6.3 and 4.0-7.2 and get the same result. Unfortunately I’m locked at kernel 2.6.20 for other reasons, or else I would try that.

 

If anyone can offer suggestions of what to try, please let me know. If this is something that has already been resolved elsewhere, sorry to waste time, and if someone can point me to what resolved it, perhaps I can look at backporting the fix myself. Thanks for your time.

 

crash-4.0-7.2$ ./crash ~/vmcore ~/targetfiles/vmlinux-2.6.20-17.39-custom2

 

crash 4.0-7.2

<snip>Copyright notices…</snip>

GNU gdb 6.1

<snip>Copyright notices…</snip>

This GDB was configured as "i686-pc-linux-gnu"...

 

please wait... (gathering module symbol data)

WARNING: cannot access vmalloc'd module memory

 

      KERNEL: /home/worthk/targetfiles/vmlinux-2.6.20-17.39-custom2

    DUMPFILE: /home/worthk/vmcore

        CPUS: 2

        DATE: Wed Oct  1 12:30:50 2008

      UPTIME: 00:35:11

LOAD AVERAGE: 0.07, 0.09, 0.08

       TASKS: 94

    NODENAME: test-module

     RELEASE: 2.6.20-17.39-custom2

     VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008

     MACHINE: i686  (2200 Mhz)

      MEMORY: 5 GB

<6>SysRq : Trigger a crashdump"

         PID: 4304

     COMMAND: "bash"

        TASK: 5d7e9030  [THREAD_INFO: f4b70000]

         CPU: 0

       STATE: TASK_RUNNING (SYSRQ)

 

crash> mod -s test

mod: cannot access vmalloc'd module memory

 

 

My kernel config is a bit outside the norm, in that the VMSPLIT value has been modified to give 3GB of memory the kernelspace and 1GB of memory to userspace. Below is a diff between the default Ubuntu “generic” config and mine:

 

diff /boot/config-2.6.20-17-generic /boot/config-2.6.20-17.37-custom2

3,4c3,4

< # Linux kernel version: 2.6.20-17-generic < # Wed Aug 20 14:43:36 2008

---

> # Linux kernel version: 2.6.20-17.37-custom2 # Tue Aug 19 18:50:53

> 2008

33c33

< CONFIG_VERSION_SIGNATURE="Ubuntu 2.6.20-17.39-generic"

---

> CONFIG_VERSION_SIGNATURE="Ubuntu 2.6.20-17.37-generic"

51c51

< # CONFIG_EMBEDDED is not set

---

> CONFIG_EMBEDDED=y

188,190c188,194

< CONFIG_HIGHMEM4G=y

< # CONFIG_HIGHMEM64G is not set

< CONFIG_PAGE_OFFSET=0xC0000000

---

> # CONFIG_HIGHMEM4G is not set

> CONFIG_HIGHMEM64G=y

> # CONFIG_VMSPLIT_3G is not set

> # CONFIG_VMSPLIT_3G_OPT is not set

> # CONFIG_VMSPLIT_2G is not set

> CONFIG_VMSPLIT_1G=y

> CONFIG_PAGE_OFFSET=0x40000000

191a196

> CONFIG_X86_PAE=y

204c209

< # CONFIG_RESOURCES_64BIT is not set

---

> CONFIG_RESOURCES_64BIT=y

1161a1167

> CONFIG_IDE_MAX_HWIFS=4

1443a1450

> # CONFIG_PATA_PLATFORM is not set

1525a1533

> CONFIG_I2O_EXT_ADAPTEC_DMA64=y

 

 

Kevin Worth
Network Security Software Engineer
ProCurve networking by HP
kevin.worth <at> hp.com
ph 916.785.4528
fx 916.785.1196

<div>

<div class="Section1">

<p class="MsoNormal"><span>Hello kexec and crash mailing
lists,<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>Sorry to spam whoever&rsquo;s
code this ISN&rsquo;T an issue with, but I really am unsure of whether is a
kdump or a crash issue. I am running an Ubuntu 7.04 with a 2.6.20 kernel
(includes Ubuntus patches- source at <a href="http://packages.ubuntu.com/feisty/linux-source-2.6.20">http://packages.ubuntu.com/feisty/linux-source-2.6.20</a>
) and a modified VMSPLIT/PAGE_OFFSET value (see bottom for details) on an i386
machine with 4GB of memory. At first I thought this could be an issue with
makedumpfile stripping out things it shouldn&rsquo;t, but I&rsquo;ve found that
setting up my initrd script so that it simply performs &ldquo;cp /proc/vmcore
/var/crash/vmcore&rdquo; results in the same issue.<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>I&rsquo;ve tried this with both
crash 4.0-6.3 and 4.0-7.2 and get the same result. Unfortunately I&rsquo;m
locked at kernel 2.6.20 for other reasons, or else I would try that.<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>If anyone can offer suggestions
of what to try, please let me know. If this is something that has already been
resolved elsewhere, sorry to waste time, and if someone can point me to what
resolved it, perhaps I can look at backporting the fix myself. Thanks for your
time.<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>crash-4.0-7.2$ ./crash ~/vmcore
~/targetfiles/vmlinux-2.6.20-17.39-custom2<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>crash 4.0-7.2<p></p></span></p>

<p class="MsoNormal"><span>&lt;snip&gt;Copyright
notices&hellip;&lt;/snip&gt;<p></p></span></p>

<p class="MsoNormal"><span>GNU gdb 6.1<p></p></span></p>

<p class="MsoNormal"><span>&lt;snip&gt;Copyright
notices&hellip;&lt;/snip&gt;<p></p></span></p>

<p class="MsoNormal"><span>This GDB was configured as
"i686-pc-linux-gnu"...<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>please wait... (gathering module
symbol data)<p></p></span></p>

<p class="MsoNormal"><span>WARNING: cannot access vmalloc'd
module memory<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
KERNEL: /home/worthk/targetfiles/vmlinux-2.6.20-17.39-custom2<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp; DUMPFILE:
/home/worthk/vmcore<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
CPUS: 2<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
DATE: Wed Oct&nbsp; 1 12:30:50 2008<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
UPTIME: 00:35:11<p></p></span></p>

<p class="MsoNormal"><span>LOAD AVERAGE: 0.07, 0.09, 0.08<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
TASKS: 94<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp; NODENAME:
test-module<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;
RELEASE: 2.6.20-17.39-custom2<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;
VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;
MACHINE: i686&nbsp; (2200 Mhz)<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
MEMORY: 5 GB<p></p></span></p>

<p class="MsoNormal"><span>&lt;6&gt;SysRq : Trigger a
crashdump"<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
PID: 4304<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;
COMMAND: "bash"<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
TASK: 5d7e9030&nbsp; [THREAD_INFO: f4b70000]<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
CPU: 0<p></p></span></p>

<p class="MsoNormal"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
STATE: TASK_RUNNING (SYSRQ)<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>crash&gt; mod -s test<p></p></span></p>

<p class="MsoNormal"><span>mod: cannot access vmalloc'd
module memory<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>My kernel config is a bit
outside the norm, in that the VMSPLIT value has been modified to give 3GB of
memory the kernelspace and 1GB of memory to userspace. Below is a diff between
the default Ubuntu &ldquo;generic&rdquo; config and mine:<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>diff
/boot/config-2.6.20-17-generic /boot/config-2.6.20-17.37-custom2<p></p></span></p>

<p class="MsoNormal"><span>3,4c3,4<p></p></span></p>

<p class="MsoNormal"><span>&lt; # Linux kernel version:
2.6.20-17-generic &lt; # Wed Aug 20 14:43:36 2008<p></p></span></p>

<p class="MsoNormal"><span>---<p></p></span></p>

<p class="MsoNormal"><span>&gt; # Linux kernel version:
2.6.20-17.37-custom2 # Tue Aug 19 18:50:53 <p></p></span></p>

<p class="MsoNormal"><span>&gt; 2008<p></p></span></p>

<p class="MsoNormal"><span>33c33<p></p></span></p>

<p class="MsoNormal"><span>&lt;
CONFIG_VERSION_SIGNATURE="Ubuntu 2.6.20-17.39-generic"<p></p></span></p>

<p class="MsoNormal"><span>---<p></p></span></p>

<p class="MsoNormal"><span>&gt;
CONFIG_VERSION_SIGNATURE="Ubuntu 2.6.20-17.37-generic"<p></p></span></p>

<p class="MsoNormal"><span>51c51<p></p></span></p>

<p class="MsoNormal"><span>&lt; # CONFIG_EMBEDDED is not
set<p></p></span></p>

<p class="MsoNormal"><span>---<p></p></span></p>

<p class="MsoNormal"><span>&gt; CONFIG_EMBEDDED=y<p></p></span></p>

<p class="MsoNormal"><span>188,190c188,194<p></p></span></p>

<p class="MsoNormal"><span>&lt; CONFIG_HIGHMEM4G=y<p></p></span></p>

<p class="MsoNormal"><span>&lt; # CONFIG_HIGHMEM64G is not
set<p></p></span></p>

<p class="MsoNormal"><span>&lt;
CONFIG_PAGE_OFFSET=0xC0000000<p></p></span></p>

<p class="MsoNormal"><span>---<p></p></span></p>

<p class="MsoNormal"><span>&gt; # CONFIG_HIGHMEM4G is not
set<p></p></span></p>

<p class="MsoNormal"><span>&gt; CONFIG_HIGHMEM64G=y<p></p></span></p>

<p class="MsoNormal"><span>&gt; # CONFIG_VMSPLIT_3G is not
set<p></p></span></p>

<p class="MsoNormal"><span>&gt; # CONFIG_VMSPLIT_3G_OPT is
not set<p></p></span></p>

<p class="MsoNormal"><span>&gt; # CONFIG_VMSPLIT_2G is not
set<p></p></span></p>

<p class="MsoNormal"><span>&gt; CONFIG_VMSPLIT_1G=y<p></p></span></p>

<p class="MsoNormal"><span>&gt;
CONFIG_PAGE_OFFSET=0x40000000<p></p></span></p>

<p class="MsoNormal"><span>191a196<p></p></span></p>

<p class="MsoNormal"><span>&gt; CONFIG_X86_PAE=y<p></p></span></p>

<p class="MsoNormal"><span>204c209<p></p></span></p>

<p class="MsoNormal"><span>&lt; # CONFIG_RESOURCES_64BIT is
not set<p></p></span></p>

<p class="MsoNormal"><span>---<p></p></span></p>

<p class="MsoNormal"><span>&gt; CONFIG_RESOURCES_64BIT=y<p></p></span></p>

<p class="MsoNormal"><span>1161a1167<p></p></span></p>

<p class="MsoNormal"><span>&gt; CONFIG_IDE_MAX_HWIFS=4<p></p></span></p>

<p class="MsoNormal"><span>1443a1450<p></p></span></p>

<p class="MsoNormal"><span>&gt; # CONFIG_PATA_PLATFORM is
not set<p></p></span></p>

<p class="MsoNormal"><span>1525a1533<p></p></span></p>

<p class="MsoNormal"><span>&gt;
CONFIG_I2O_EXT_ADAPTEC_DMA64=y<p></p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span><p>&nbsp;</p></span></p>

<p class="MsoNormal"><span>Kevin Worth<br>
Network Security Software Engineer<br>
ProCurve networking by HP<br><a href="mailto:kevin.worth <at> hp.com">kevin.worth <at> hp.com</a> <br>
ph 916.785.4528<br>
fx 916.785.1196<p></p></span></p>

</div>

</div>
Dave Anderson | 1 Oct 2008 21:43
Picon
Favicon

Re: "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Worth, Kevin wrote:
> Hello kexec and crash mailing lists,
> 
>  
> 
> Sorry to spam whoever’s code this ISN’T an issue with, but I really am 
> unsure of whether is a kdump or a crash issue. I am running an Ubuntu 
> 7.04 with a 2.6.20 kernel (includes Ubuntus patches- source at 
> http://packages.ubuntu.com/feisty/linux-source-2.6.20 ) and a modified 
> VMSPLIT/PAGE_OFFSET value (see bottom for details) on an i386 machine 
> with 4GB of memory. At first I thought this could be an issue with 
> makedumpfile stripping out things it shouldn’t, but I’ve found that 
> setting up my initrd script so that it simply performs “cp /proc/vmcore 
> /var/crash/vmcore” results in the same issue.
> 
>  
> 
> I’ve tried this with both crash 4.0-6.3 and 4.0-7.2 and get the same 
> result. Unfortunately I’m locked at kernel 2.6.20 for other reasons, or 
> else I would try that.
> 
>  
> 
> If anyone can offer suggestions of what to try, please let me know. If 
> this is something that has already been resolved elsewhere, sorry to 
> waste time, and if someone can point me to what resolved it, perhaps I 
> can look at backporting the fix myself. Thanks for your time.
> 
>  
> 
> crash-4.0-7.2$ ./crash ~/vmcore ~/targetfiles/vmlinux-2.6.20-17.39-custom2
> 
>  
> 
> crash 4.0-7.2
> 
> <snip>Copyright notices…</snip>
> 
> GNU gdb 6.1
> 
> <snip>Copyright notices…</snip>
> 
> This GDB was configured as "i686-pc-linux-gnu"...
> 
>  
> 
> please wait... (gathering module symbol data)
> 
> WARNING: cannot access vmalloc'd module memory
> 
>  
> 
>       KERNEL: /home/worthk/targetfiles/vmlinux-2.6.20-17.39-custom2
> 
>     DUMPFILE: /home/worthk/vmcore
> 
>         CPUS: 2
> 
>         DATE: Wed Oct  1 12:30:50 2008
> 
>       UPTIME: 00:35:11
> 
> LOAD AVERAGE: 0.07, 0.09, 0.08
> 
>        TASKS: 94
> 
>     NODENAME: test-module
> 
>      RELEASE: 2.6.20-17.39-custom2
> 
>      VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
> 
>      MACHINE: i686  (2200 Mhz)
> 
>       MEMORY: 5 GB
> 
> <6>SysRq : Trigger a crashdump"
> 
>          PID: 4304
> 
>      COMMAND: "bash"
> 
>         TASK: 5d7e9030  [THREAD_INFO: f4b70000]
> 
>          CPU: 0
> 
>        STATE: TASK_RUNNING (SYSRQ)
> 
>  
> 
> crash> mod -s test
> 
> mod: cannot access vmalloc'd module memory
> 
>  
> 
>  
> 
> My kernel config is a bit outside the norm, in that the VMSPLIT value 
> has been modified to give 3GB of memory the kernelspace and 1GB of 
> memory to userspace. Below is a diff between the default Ubuntu 
> “generic” config and mine:

To be honest with you I'm surprised it comes up at all...

If you do a "crash -d7 vmlinux vmcore", amongst the reams of
debug data you will see the readmem() that failed just prior
to the "WARNING: cannot access vmalloc'd module memory".  And
that will probably be the very first access of a vmalloc'd
virtual memory address.  Probably it's best to enter
"crash -d7 vmlinux vmcore > /tmp/junk", and then enter "q"
to silently kill the session.

For that matter, once you come up, I'm guessing that user
virtual address translation will fail as well.  Come up
as you did above, do a "vm" command on the "bash" task,
and then a "vtop" on a user virtual address.

Like this example:

crash> vm
PID: 25479  TASK: f6f2aaa0  CPU: 3   COMMAND: "bash"
    MM       PGD      RSS    TOTAL_VM
f6e3d740  f745c980  1560k    4608k
   VMA       START      END    FLAGS  FILE
f6c115f4    110000    112000     75  /lib/
f7212f94    112000    113000 100071  /lib/
f78cd0cc    113000    114000 100073  /lib/
f6954d84    584000    585000 8000075
f7241bcc    5b1000    5b4000     75  /lib/libtermcap.so.2.0.8.#prelink#.YYRDOu
f7212a14    5b4000    5b5000 100073  /lib/libtermcap.so.2.0.8.#prelink#.YYRDOu
f6e1a64c    61a000    623000     75  /lib/libnss_files-2.5.so
f73f738c    623000    624000 100071  /lib/libnss_files-2.5.so
f6eb79bc    624000    625000 100073  /lib/libnss_files-2.5.so
f7212f3c    719000    733000    875  /lib/
f721238c    733000    734000 100871  /lib/
f72e2b1c    734000    735000 100873  /lib/
f73f7964    ab5000    bf2000     75  /lib/
f6c11ee4    bf2000    bf4000 100071  /lib/
f73f7ee4    bf4000    bf5000 100073  /lib/
f73f7f3c    bf5000    bf8000 100073
f721217c   8048000   80f5000   1875  /bin/
f6cab90c   80f5000   80fa000 101873  /bin/
f724143c   80fa000   80ff000 100073
f68cf7ac   8574000   85aa000 100073
f6d354ec  b7d81000  b7f81000     71  /usr/lib/locale/locale-archive
f7594d84  b7f81000  b7f84000 100073
f6f24124  b7f84000  b7f85000 100073
f72e2d2c  b7f85000  b7f8c000     d1  /usr/lib/gconv/gconv-modules.cache
f72418b4  bfa8a000  bfa9f000 100173
crash> vtop 584000
VIRTUAL   PHYSICAL
584000    37cd6000

PAGE DIRECTORY: f745c980
   PGD: f745c980 => 369e0001
   PMD: 369e0010 => 11b7c1067
   PTE: 11b7c1c20 => 37cd6025
  PAGE: 37cd6000

   PTE     PHYSICAL  FLAGS
37cd6025  37cd6000  (PRESENT|USER|ACCESSED)

   VMA       START      END    FLAGS  FILE
f6954d84    584000    585000 8000075

   PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
c16f9ac0   37cd6000         0         0 48 80000004
crash>

Does the vtop command fall apart somewhere?

BTW, if you haven't done it already, you should also
take the dumpfile out of the picture, and just run crash
on the live system.  If by some stretch of the imagination
*that* works, then you might have to point the finger back
at kdump operation.

In any case, at least you've got a situation where crash
can at least deal with unity-mapped addresses.  With those
addresses it doesn't have to do any kind of page-table
walk-throughs.

I'm guessing that there's something in x86.c's x86_init() function
in the PRE_GDB section that's not correct for your setup.
There is support for Red Hat's older "hugemem" 4G/4G split
kernels, where both the kernel and user space have 4G
(over-lapping) virtual address regions, and so there may be
some confusion there with yours.

For starters, bring up the session as you did above,
and enter "help -v" and "help -m".  They're debug
options that dump a couple internal crash data structures
which may shed some light.

Dave

Worth, Kevin | 2 Oct 2008 02:03
Picon
Favicon

RE: "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Tried running crash on a running kernel... seems that 4.0-3.7 doesn't like my kernel. When I run crash
4.0-7.2 on a live system, it appears that it has no problems with vmalloc'd module memory.

crash 4.0-3.7
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu"...

crash: /boot/System.map-2.6.20-17.39-custom2 and /dev/mem do not match!

Usage:
  crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]

Enter "crash -h" for details.

crash 4.0-7.2
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu"...

      KERNEL: vmlinux-2.6.20-17.39-custom2
    DUMPFILE: /dev/mem
        CPUS: 2
        DATE: Wed Oct  1 16:31:39 2008
      UPTIME: 04:57:53
LOAD AVERAGE: 0.10, 0.09, 0.09
       TASKS: 95
    NODENAME: ProCurve-TMS-zl-Module
     RELEASE: 2.6.20-17.39-custom2
     VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
     MACHINE: i686  (2200 Mhz)
      MEMORY: 5 GB
         PID: 15801
     COMMAND: "crash"
        TASK: 47bd6030  [THREAD_INFO: 4a8a8000]
         CPU: 1
       STATE: TASK_RUNNING (ACTIVE)
crash>

Since that seems ok (and I don't encounter the error) I'll run crash with -d7 on the dump file to hopefully
expose what is wrong with either the dump or with crash.

I've attached the output of crash with -d7... not sure how the mailing like handles file attachments, but if
needed I can paste the text. (or if there is something specific I should look for let me know and I can paste
just that section).

-Kevin

-----Original Message-----
From: crash-utility-bounces <at> redhat.com [mailto:crash-utility-bounces <at> redhat.com] On Behalf Of
Dave Anderson
Sent: Wednesday, October 01, 2008 12:44 PM
To: Discussion list for crash utility usage, maintenance and development
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Worth, Kevin wrote:
> Hello kexec and crash mailing lists,
>
>
>
> Sorry to spam whoever's code this ISN'T an issue with, but I really am
> unsure of whether is a kdump or a crash issue. I am running an Ubuntu
> 7.04 with a 2.6.20 kernel (includes Ubuntus patches- source at
> http://packages.ubuntu.com/feisty/linux-source-2.6.20 ) and a modified
> VMSPLIT/PAGE_OFFSET value (see bottom for details) on an i386 machine
> with 4GB of memory. At first I thought this could be an issue with
> makedumpfile stripping out things it shouldn't, but I've found that
> setting up my initrd script so that it simply performs "cp /proc/vmcore
> /var/crash/vmcore" results in the same issue.
>
>
>
> I've tried this with both crash 4.0-6.3 and 4.0-7.2 and get the same
> result. Unfortunately I'm locked at kernel 2.6.20 for other reasons, or
> else I would try that.
>
>
>
> If anyone can offer suggestions of what to try, please let me know. If
> this is something that has already been resolved elsewhere, sorry to
> waste time, and if someone can point me to what resolved it, perhaps I
> can look at backporting the fix myself. Thanks for your time.
>
>
>
> crash-4.0-7.2$ ./crash ~/vmcore ~/targetfiles/vmlinux-2.6.20-17.39-custom2
>
>
>
> crash 4.0-7.2
>
> <snip>Copyright notices...</snip>
>
> GNU gdb 6.1
>
> <snip>Copyright notices...</snip>
>
> This GDB was configured as "i686-pc-linux-gnu"...
>
>
>
> please wait... (gathering module symbol data)
>
> WARNING: cannot access vmalloc'd module memory
>
>
>
>       KERNEL: /home/worthk/targetfiles/vmlinux-2.6.20-17.39-custom2
>
>     DUMPFILE: /home/worthk/vmcore
>
>         CPUS: 2
>
>         DATE: Wed Oct  1 12:30:50 2008
>
>       UPTIME: 00:35:11
>
> LOAD AVERAGE: 0.07, 0.09, 0.08
>
>        TASKS: 94
>
>     NODENAME: test-module
>
>      RELEASE: 2.6.20-17.39-custom2
>
>      VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
>
>      MACHINE: i686  (2200 Mhz)
>
>       MEMORY: 5 GB
>
> <6>SysRq : Trigger a crashdump"
>
>          PID: 4304
>
>      COMMAND: "bash"
>
>         TASK: 5d7e9030  [THREAD_INFO: f4b70000]
>
>          CPU: 0
>
>        STATE: TASK_RUNNING (SYSRQ)
>
>
>
> crash> mod -s test
>
> mod: cannot access vmalloc'd module memory
>
>
>
>
>
> My kernel config is a bit outside the norm, in that the VMSPLIT value
> has been modified to give 3GB of memory the kernelspace and 1GB of
> memory to userspace. Below is a diff between the default Ubuntu
> "generic" config and mine:

To be honest with you I'm surprised it comes up at all...

If you do a "crash -d7 vmlinux vmcore", amongst the reams of
debug data you will see the readmem() that failed just prior
to the "WARNING: cannot access vmalloc'd module memory".  And
that will probably be the very first access of a vmalloc'd
virtual memory address.  Probably it's best to enter
"crash -d7 vmlinux vmcore > /tmp/junk", and then enter "q"
to silently kill the session.

For that matter, once you come up, I'm guessing that user
virtual address translation will fail as well.  Come up
as you did above, do a "vm" command on the "bash" task,
and then a "vtop" on a user virtual address.

Like this example:

crash> vm
PID: 25479  TASK: f6f2aaa0  CPU: 3   COMMAND: "bash"
    MM       PGD      RSS    TOTAL_VM
f6e3d740  f745c980  1560k    4608k
   VMA       START      END    FLAGS  FILE
f6c115f4    110000    112000     75  /lib/
f7212f94    112000    113000 100071  /lib/
f78cd0cc    113000    114000 100073  /lib/
f6954d84    584000    585000 8000075
f7241bcc    5b1000    5b4000     75  /lib/libtermcap.so.2.0.8.#prelink#.YYRDOu
f7212a14    5b4000    5b5000 100073  /lib/libtermcap.so.2.0.8.#prelink#.YYRDOu
f6e1a64c    61a000    623000     75  /lib/libnss_files-2.5.so
f73f738c    623000    624000 100071  /lib/libnss_files-2.5.so
f6eb79bc    624000    625000 100073  /lib/libnss_files-2.5.so
f7212f3c    719000    733000    875  /lib/
f721238c    733000    734000 100871  /lib/
f72e2b1c    734000    735000 100873  /lib/
f73f7964    ab5000    bf2000     75  /lib/
f6c11ee4    bf2000    bf4000 100071  /lib/
f73f7ee4    bf4000    bf5000 100073  /lib/
f73f7f3c    bf5000    bf8000 100073
f721217c   8048000   80f5000   1875  /bin/
f6cab90c   80f5000   80fa000 101873  /bin/
f724143c   80fa000   80ff000 100073
f68cf7ac   8574000   85aa000 100073
f6d354ec  b7d81000  b7f81000     71  /usr/lib/locale/locale-archive
f7594d84  b7f81000  b7f84000 100073
f6f24124  b7f84000  b7f85000 100073
f72e2d2c  b7f85000  b7f8c000     d1  /usr/lib/gconv/gconv-modules.cache
f72418b4  bfa8a000  bfa9f000 100173
crash> vtop 584000
VIRTUAL   PHYSICAL
584000    37cd6000

PAGE DIRECTORY: f745c980
   PGD: f745c980 => 369e0001
   PMD: 369e0010 => 11b7c1067
   PTE: 11b7c1c20 => 37cd6025
  PAGE: 37cd6000

   PTE     PHYSICAL  FLAGS
37cd6025  37cd6000  (PRESENT|USER|ACCESSED)

   VMA       START      END    FLAGS  FILE
f6954d84    584000    585000 8000075

   PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
c16f9ac0   37cd6000         0         0 48 80000004
crash>

Does the vtop command fall apart somewhere?

BTW, if you haven't done it already, you should also
take the dumpfile out of the picture, and just run crash
on the live system.  If by some stretch of the imagination
*that* works, then you might have to point the finger back
at kdump operation.

In any case, at least you've got a situation where crash
can at least deal with unity-mapped addresses.  With those
addresses it doesn't have to do any kind of page-table
walk-throughs.

I'm guessing that there's something in x86.c's x86_init() function
in the PRE_GDB section that's not correct for your setup.
There is support for Red Hat's older "hugemem" 4G/4G split
kernels, where both the kernel and user space have 4G
(over-lapping) virtual address regions, and so there may be
some confusion there with yours.

For starters, bring up the session as you did above,
and enter "help -v" and "help -m".  They're debug
options that dump a couple internal crash data structures
which may shed some light.

Dave

--
Crash-utility mailing list
Crash-utility <at> redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
Attachment (crash.log): application/octet-stream, 97 KiB
Tried running crash on a running kernel... seems that 4.0-3.7 doesn't like my kernel. When I run crash
4.0-7.2 on a live system, it appears that it has no problems with vmalloc'd module memory.

crash 4.0-3.7
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu"...

crash: /boot/System.map-2.6.20-17.39-custom2 and /dev/mem do not match!

Usage:
  crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]

Enter "crash -h" for details.

crash 4.0-7.2
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu"...

      KERNEL: vmlinux-2.6.20-17.39-custom2
    DUMPFILE: /dev/mem
        CPUS: 2
        DATE: Wed Oct  1 16:31:39 2008
      UPTIME: 04:57:53
LOAD AVERAGE: 0.10, 0.09, 0.09
       TASKS: 95
    NODENAME: ProCurve-TMS-zl-Module
     RELEASE: 2.6.20-17.39-custom2
     VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
     MACHINE: i686  (2200 Mhz)
      MEMORY: 5 GB
         PID: 15801
     COMMAND: "crash"
        TASK: 47bd6030  [THREAD_INFO: 4a8a8000]
         CPU: 1
       STATE: TASK_RUNNING (ACTIVE)
crash>

Since that seems ok (and I don't encounter the error) I'll run crash with -d7 on the dump file to hopefully
expose what is wrong with either the dump or with crash.

I've attached the output of crash with -d7... not sure how the mailing like handles file attachments, but if
needed I can paste the text. (or if there is something specific I should look for let me know and I can paste
just that section).

-Kevin

-----Original Message-----
From: crash-utility-bounces <at> redhat.com [mailto:crash-utility-bounces <at> redhat.com] On Behalf Of
Dave Anderson
Sent: Wednesday, October 01, 2008 12:44 PM
To: Discussion list for crash utility usage, maintenance and development
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Worth, Kevin wrote:
> Hello kexec and crash mailing lists,
>
>
>
> Sorry to spam whoever's code this ISN'T an issue with, but I really am
> unsure of whether is a kdump or a crash issue. I am running an Ubuntu
> 7.04 with a 2.6.20 kernel (includes Ubuntus patches- source at
> http://packages.ubuntu.com/feisty/linux-source-2.6.20 ) and a modified
> VMSPLIT/PAGE_OFFSET value (see bottom for details) on an i386 machine
> with 4GB of memory. At first I thought this could be an issue with
> makedumpfile stripping out things it shouldn't, but I've found that
> setting up my initrd script so that it simply performs "cp /proc/vmcore
> /var/crash/vmcore" results in the same issue.
>
>
>
> I've tried this with both crash 4.0-6.3 and 4.0-7.2 and get the same
> result. Unfortunately I'm locked at kernel 2.6.20 for other reasons, or
> else I would try that.
>
>
>
> If anyone can offer suggestions of what to try, please let me know. If
> this is something that has already been resolved elsewhere, sorry to
> waste time, and if someone can point me to what resolved it, perhaps I
> can look at backporting the fix myself. Thanks for your time.
>
>
>
> crash-4.0-7.2$ ./crash ~/vmcore ~/targetfiles/vmlinux-2.6.20-17.39-custom2
>
>
>
> crash 4.0-7.2
>
> <snip>Copyright notices...</snip>
>
> GNU gdb 6.1
>
> <snip>Copyright notices...</snip>
>
> This GDB was configured as "i686-pc-linux-gnu"...
>
>
>
> please wait... (gathering module symbol data)
>
> WARNING: cannot access vmalloc'd module memory
>
>
>
>       KERNEL: /home/worthk/targetfiles/vmlinux-2.6.20-17.39-custom2
>
>     DUMPFILE: /home/worthk/vmcore
>
>         CPUS: 2
>
>         DATE: Wed Oct  1 12:30:50 2008
>
>       UPTIME: 00:35:11
>
> LOAD AVERAGE: 0.07, 0.09, 0.08
>
>        TASKS: 94
>
>     NODENAME: test-module
>
>      RELEASE: 2.6.20-17.39-custom2
>
>      VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
>
>      MACHINE: i686  (2200 Mhz)
>
>       MEMORY: 5 GB
>
> <6>SysRq : Trigger a crashdump"
>
>          PID: 4304
>
>      COMMAND: "bash"
>
>         TASK: 5d7e9030  [THREAD_INFO: f4b70000]
>
>          CPU: 0
>
>        STATE: TASK_RUNNING (SYSRQ)
>
>
>
> crash> mod -s test
>
> mod: cannot access vmalloc'd module memory
>
>
>
>
>
> My kernel config is a bit outside the norm, in that the VMSPLIT value
> has been modified to give 3GB of memory the kernelspace and 1GB of
> memory to userspace. Below is a diff between the default Ubuntu
> "generic" config and mine:

To be honest with you I'm surprised it comes up at all...

If you do a "crash -d7 vmlinux vmcore", amongst the reams of
debug data you will see the readmem() that failed just prior
to the "WARNING: cannot access vmalloc'd module memory".  And
that will probably be the very first access of a vmalloc'd
virtual memory address.  Probably it's best to enter
"crash -d7 vmlinux vmcore > /tmp/junk", and then enter "q"
to silently kill the session.

For that matter, once you come up, I'm guessing that user
virtual address translation will fail as well.  Come up
as you did above, do a "vm" command on the "bash" task,
and then a "vtop" on a user virtual address.

Like this example:

crash> vm
PID: 25479  TASK: f6f2aaa0  CPU: 3   COMMAND: "bash"
    MM       PGD      RSS    TOTAL_VM
f6e3d740  f745c980  1560k    4608k
   VMA       START      END    FLAGS  FILE
f6c115f4    110000    112000     75  /lib/
f7212f94    112000    113000 100071  /lib/
f78cd0cc    113000    114000 100073  /lib/
f6954d84    584000    585000 8000075
f7241bcc    5b1000    5b4000     75  /lib/libtermcap.so.2.0.8.#prelink#.YYRDOu
f7212a14    5b4000    5b5000 100073  /lib/libtermcap.so.2.0.8.#prelink#.YYRDOu
f6e1a64c    61a000    623000     75  /lib/libnss_files-2.5.so
f73f738c    623000    624000 100071  /lib/libnss_files-2.5.so
f6eb79bc    624000    625000 100073  /lib/libnss_files-2.5.so
f7212f3c    719000    733000    875  /lib/
f721238c    733000    734000 100871  /lib/
f72e2b1c    734000    735000 100873  /lib/
f73f7964    ab5000    bf2000     75  /lib/
f6c11ee4    bf2000    bf4000 100071  /lib/
f73f7ee4    bf4000    bf5000 100073  /lib/
f73f7f3c    bf5000    bf8000 100073
f721217c   8048000   80f5000   1875  /bin/
f6cab90c   80f5000   80fa000 101873  /bin/
f724143c   80fa000   80ff000 100073
f68cf7ac   8574000   85aa000 100073
f6d354ec  b7d81000  b7f81000     71  /usr/lib/locale/locale-archive
f7594d84  b7f81000  b7f84000 100073
f6f24124  b7f84000  b7f85000 100073
f72e2d2c  b7f85000  b7f8c000     d1  /usr/lib/gconv/gconv-modules.cache
f72418b4  bfa8a000  bfa9f000 100173
crash> vtop 584000
VIRTUAL   PHYSICAL
584000    37cd6000

PAGE DIRECTORY: f745c980
   PGD: f745c980 => 369e0001
   PMD: 369e0010 => 11b7c1067
   PTE: 11b7c1c20 => 37cd6025
  PAGE: 37cd6000

   PTE     PHYSICAL  FLAGS
37cd6025  37cd6000  (PRESENT|USER|ACCESSED)

   VMA       START      END    FLAGS  FILE
f6954d84    584000    585000 8000075

   PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
c16f9ac0   37cd6000         0         0 48 80000004
crash>

Does the vtop command fall apart somewhere?

BTW, if you haven't done it already, you should also
take the dumpfile out of the picture, and just run crash
on the live system.  If by some stretch of the imagination
*that* works, then you might have to point the finger back
at kdump operation.

In any case, at least you've got a situation where crash
can at least deal with unity-mapped addresses.  With those
addresses it doesn't have to do any kind of page-table
walk-throughs.

I'm guessing that there's something in x86.c's x86_init() function
in the PRE_GDB section that's not correct for your setup.
There is support for Red Hat's older "hugemem" 4G/4G split
kernels, where both the kernel and user space have 4G
(over-lapping) virtual address regions, and so there may be
some confusion there with yours.

For starters, bring up the session as you did above,
and enter "help -v" and "help -m".  They're debug
options that dump a couple internal crash data structures
which may shed some light.

Dave

--
Crash-utility mailing list
Crash-utility <at> redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
Dave Anderson | 2 Oct 2008 16:49
Picon
Favicon

Re: "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Worth, Kevin wrote:
> Tried running crash on a running kernel... seems that 4.0-3.7 doesn't like my kernel. When I run crash
4.0-7.2 on a live system, it appears that it has no problems with vmalloc'd module memory.
> 
> crash 4.0-3.7
> ...
> GNU gdb 6.1
> ...
> This GDB was configured as "i686-pc-linux-gnu"...
> 
> crash: /boot/System.map-2.6.20-17.39-custom2 and /dev/mem do not match!
> 
> Usage:
>   crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]
> 
> Enter "crash -h" for details.
> 
> 
> crash 4.0-7.2
> ...
> GNU gdb 6.1
> ...
> This GDB was configured as "i686-pc-linux-gnu"...
> 
>       KERNEL: vmlinux-2.6.20-17.39-custom2
>     DUMPFILE: /dev/mem
>         CPUS: 2
>         DATE: Wed Oct  1 16:31:39 2008
>       UPTIME: 04:57:53
> LOAD AVERAGE: 0.10, 0.09, 0.09
>        TASKS: 95
>     NODENAME: ProCurve-TMS-zl-Module
>      RELEASE: 2.6.20-17.39-custom2
>      VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
>      MACHINE: i686  (2200 Mhz)
>       MEMORY: 5 GB
>          PID: 15801
>      COMMAND: "crash"
>         TASK: 47bd6030  [THREAD_INFO: 4a8a8000]
>          CPU: 1
>        STATE: TASK_RUNNING (ACTIVE)
> crash>
> 
> Since that seems ok (and I don't encounter the error) I'll run crash with -d7 on the dump file to hopefully
expose what is wrong with either the dump or with crash.
> 
> I've attached the output of crash with -d7... not sure how the mailing like handles file attachments, but
if needed I can paste the text. (or if there is something specific I should look for let me know and I can paste
just that section).

Yeah, crash 4.0-3.7 is 2 years old, which is pretty ancient.
Plus I'm only interested in helping out with the latest version.

But according to the above, 4.0-7.2 works OK on the live system?
You can do a "mod" command and it works OK?

Sometimes on larger-memory systems, running live using /dev/mem,
you might see the "WARNING: cannot access vmalloc'd module"
message because the physical memory that is backing the
vmalloc'd virtual address is in highmem, and cannot be
accessed by /dev/mem.  In any case, it appears that the
module structures have all been read successfully on your
live system.

And that's kind of bothersome, because for all practical
purposes, the crash utility doesn't care where it's getting
the physical memory from (i.e., from /dev/mem or from the
dumpfile).  And if it works on the live system, it should
work with the dumpfile.

Anyway, looking at the crash.log, here's what's happening:

Everything was running fine until the module initialization
step.  The list of installed kernel modules is headed up
from the "modules" list_head symbol at 403c63a4, which
contains a pointer to the first module structure at
vmalloc address f9088280:

   ...
   <readmem: 403c63a4, KVADDR, "modules", 4, (FOE), 83ff8cc>
   please wait... (gathering module symbol data)
   module: f9088280

The readmem() of that first module -- and the very first vmalloc
address -- at f9088280 required a page table translation:

   <readmem: f9088280, KVADDR, "module struct", 1536, (ROE|Q), 842a5e0>
   <readmem: 4044b000, KVADDR, "pgd page", 32, (FOE), 845a308>
   <readmem: 6000, PHYSADDR, "pmd page", 4096, (FOE), 845b310>
   <readmem: 1d515000, PHYSADDR, "page table", 4096, (FOE), 845c318>

That readmem() appears to have worked, because it thinks it
successfully read the module struct at that address.  But when
it pulled out the address of the *next* module in the linked list,
it read this:

   module: fffffffc

And when it tried to read that bogus address, it failed, and
led to the WARNING message:

   <readmem: fffffffc, KVADDR, "module struct", 1536, (ROE|Q), 842a5e0>
   <readmem: 7000, PHYSADDR, "page table", 4096, (FOE), 845c318>

   crash: invalid kernel virtual address: fffffffc  type: "module struct"

   WARNING: cannot access vmalloc'd module memory
   ...

Although I cannot say for sure, I'm presuming that the initial
read of the module structure at f9088280 ended up reading from
the wrong location and therefore read garbage.  You can verify
that by bringing the a dumpfile session, and doing this:

   crash> module f9088280

It *should* display something that is recognizable as a module
structure.  For example:

   crash> mod | grep ext3
   f8899080  ext3                123593  (not loaded)  [CONFIG_KALLSYMS]
   crash> module f8899080
   struct module {
     state = MODULE_STATE_LIVE,
     list = {
       next = 0xf8854a84,
       prev = 0xf8876984
     },
     name = "ext3"
     mkobj = {
       kobj = {
         k_name = 0xf88990cc "ext3",
         name = "ext3",
         kref = {
           refcount = {
             counter = 2
           }
         },
     ...

Your attempt will probably show the fffffffc in the list_head
just after the "state" field at the top, as well as a bunch
of other garbage.

And as I suggested in my first reply, can you also verify that
user virtual address translations also fail?  I suggested pulling
a sample virtual address out of the current context's ("bash")
VM, but doing that may "look" like it's working, but it may
be doing it incorrectly.  So you also need to verify the data
that it finds there.  One way to do that is to read the beginning
of the /bin/bash text segment, and look for "ELF" string.

For example, here I'm in a "bash" context, similar to the
context that your dumpfile comes up in by default:

   crash> set
       PID: 19839
   COMMAND: "bash"
      TASK: f7b03000  [THREAD_INFO: def66000]
       CPU: 1
     STATE: TASK_INTERRUPTIBLE
   crash>

Dump the virtual memory regions, and find the first VMA
that is backed by "/bin/bash":

   crash> vm
   PID: 19839  TASK: f7b03000  CPU: 1   COMMAND: "bash"
      MM       PGD      RSS    TOTAL_VM
   f6dc5740  f745c9c0  1392k    4532k
     VMA       START      END    FLAGS  FILE
   f69019bc    6fa000    703000     75  /lib/libnss_files-2.5.so
   f69013e4    703000    704000 100071  /lib/libnss_files-2.5.so
   f6901d84    704000    705000 100073  /lib/libnss_files-2.5.so
   f6901284    a7c000    a96000    875  /lib/ld-2.5.so
   f6901b74    a96000    a97000 100871  /lib/ld-2.5.so
   f6901b1c    a97000    a98000 100873  /lib/ld-2.5.so
   f69012dc    a9a000    bd7000     75  /lib/libc-2.5.so
   f690185c    bd7000    bd9000 100071  /lib/libc-2.5.so
   f6901ac4    bd9000    bda000 100073  /lib/libc-2.5.so
   f69017ac    bda000    bdd000 100073
   f6901e8c    bdf000    be1000     75  /lib/libdl-2.5.so
   f6901a6c    be1000    be2000 100071  /lib/libdl-2.5.so
   f6901754    be2000    be3000 100073  /lib/libdl-2.5.so
   f6901f94    c89000    c8c000     75  /lib/libtermcap.so.2.0.8
   f69016fc    c8c000    c8d000 100073  /lib/libtermcap.so.2.0.8
   f6901d2c    fd1000    fd2000 8000075
   f6901124   8047000   80f5000   1875  /bin/bash
   f69018b4   80f5000   80fa000 101873  /bin/bash
   f6901964   80fa000   80ff000 100073
   f690122c   9a75000   9a96000 100073
   f680890c  b7d7f000  b7f7f000     71  /usr/lib/locale/locale-archive
   f6901f3c  b7f7f000  b7f81000 100073
   f68cfb74  b7f82000  b7f84000 100073
   f6dd69bc  b7f84000  b7f8b000     d1  /usr/lib/gconv/gconv-modules.cache
   f69014ec  bf86e000  bf884000 100173
   crash>

You can see above, that in my case the text region starts at
user virtual address 8047000.  That actually points to the
ELF header at the beginning of the "/bin/bash" file, which
starts with a 0x7f followed by the ascii "ELF" characters:

   crash> rd 8047000
   8047000:  464c457f                              .ELF
   crash>

You might want to use "rd -u <address>" to ensure that
crash will presume that the address is a user address,
just in case that's an issue with your setup.

Anyway, try the above, and also dump out the and save
the output of these debug commands:

   crash> help -m > help.k
   crash> help -k > help.m
   crash> help -v > help.v

But again, given that you seem to be saying that everything
works just fine on the live system, the debugging of this
issue will most likely end up requiring that you determine
where exactly things "go wrong" with the dumpfile in comparison
to the same things working correctly on the live system.

Thanks,
   Dave

Worth, Kevin | 2 Oct 2008 23:59
Picon
Favicon

RE: "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Yep, I can run mod commands on a live system just fine.

Looks like "next" doesn't point to fffffffc...

crash> module f9088280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0x0,
    prev = 0x0
  },
  name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000",
  mkobj = {
    kobj = {
      k_name = 0x0,
      name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000",
      kref = {
        refcount = {
          counter = 0
        }
      },
      entry = {
        next = 0x0,
        prev = 0x0
      },
      parent = 0x0,
      kset = 0x0,
      ktype = 0x0,
      dentry = 0x0,
      poll = {
        lock = {
          raw_lock = {
            slock = 0
          }
        },
        task_list = {
          next = 0x0,
          prev = 0x0
        }
      }
    },
    mod = 0x0
  },
...and all the rest of the struct is zeros too...

Does the following mean that user virtual address translations are failing too?

crash> set
    PID: 4304
COMMAND: "bash"
   TASK: 5d7e9030  [THREAD_INFO: f4b70000]
    CPU: 0
  STATE: TASK_RUNNING (SYSRQ)
crash> vm
PID: 4304   TASK: 5d7e9030  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
f7e7f040  5d5002c0  2616k    3972k
  VMA       START      END    FLAGS  FILE
5fe454ec   8048000   80ee000   1875  /bin/bash
5fe45e34   80ee000   80f3000 101877  /bin/bash
...

crash> rd 8048000
rd: invalid kernel virtual address: 8048000  type: "32-bit KVADDR"
crash> rd -u 8048000
rd: invalid user virtual address: 8048000  type: "32-bit UVADDR"
crash> rd 80ee000
rd: invalid kernel virtual address: 80ee000  type: "32-bit KVADDR"
crash> rd -u 80ee000
rd: invalid user virtual address: 80ee000  type: "32-bit UVADDR"

help.k, .v, .m files attached. Hopefully my results here are meaningful to you, because I don't know nearly
enough about the memory architecture to claim to have a clue about this.

-Kevin

-----Original Message-----
From: crash-utility-bounces <at> redhat.com [mailto:crash-utility-bounces <at> redhat.com] On Behalf Of
Dave Anderson
Sent: Thursday, October 02, 2008 7:49 AM
To: Discussion list for crash utility usage, maintenance and development
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Worth, Kevin wrote:
> Tried running crash on a running kernel... seems that 4.0-3.7 doesn't like my kernel. When I run crash
4.0-7.2 on a live system, it appears that it has no problems with vmalloc'd module memory.
>
> crash 4.0-3.7
> ...
> GNU gdb 6.1
> ...
> This GDB was configured as "i686-pc-linux-gnu"...
>
> crash: /boot/System.map-2.6.20-17.39-custom2 and /dev/mem do not match!
>
> Usage:
>   crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]
>
> Enter "crash -h" for details.
>
>
> crash 4.0-7.2
> ...
> GNU gdb 6.1
> ...
> This GDB was configured as "i686-pc-linux-gnu"...
>
>       KERNEL: vmlinux-2.6.20-17.39-custom2
>     DUMPFILE: /dev/mem
>         CPUS: 2
>         DATE: Wed Oct  1 16:31:39 2008
>       UPTIME: 04:57:53
> LOAD AVERAGE: 0.10, 0.09, 0.09
>        TASKS: 95
>     NODENAME: ProCurve-TMS-zl-Module
>      RELEASE: 2.6.20-17.39-custom2
>      VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
>      MACHINE: i686  (2200 Mhz)
>       MEMORY: 5 GB
>          PID: 15801
>      COMMAND: "crash"
>         TASK: 47bd6030  [THREAD_INFO: 4a8a8000]
>          CPU: 1
>        STATE: TASK_RUNNING (ACTIVE)
> crash>
>
> Since that seems ok (and I don't encounter the error) I'll run crash with -d7 on the dump file to hopefully
expose what is wrong with either the dump or with crash.
>
> I've attached the output of crash with -d7... not sure how the mailing like handles file attachments, but
if needed I can paste the text. (or if there is something specific I should look for let me know and I can paste
just that section).

Yeah, crash 4.0-3.7 is 2 years old, which is pretty ancient.
Plus I'm only interested in helping out with the latest version.

But according to the above, 4.0-7.2 works OK on the live system?
You can do a "mod" command and it works OK?

Sometimes on larger-memory systems, running live using /dev/mem,
you might see the "WARNING: cannot access vmalloc'd module"
message because the physical memory that is backing the
vmalloc'd virtual address is in highmem, and cannot be
accessed by /dev/mem.  In any case, it appears that the
module structures have all been read successfully on your
live system.

And that's kind of bothersome, because for all practical
purposes, the crash utility doesn't care where it's getting
the physical memory from (i.e., from /dev/mem or from the
dumpfile).  And if it works on the live system, it should
work with the dumpfile.

Anyway, looking at the crash.log, here's what's happening:

Everything was running fine until the module initialization
step.  The list of installed kernel modules is headed up
from the "modules" list_head symbol at 403c63a4, which
contains a pointer to the first module structure at
vmalloc address f9088280:

   ...
   <readmem: 403c63a4, KVADDR, "modules", 4, (FOE), 83ff8cc>
   please wait... (gathering module symbol data)
   module: f9088280

The readmem() of that first module -- and the very first vmalloc
address -- at f9088280 required a page table translation:

   <readmem: f9088280, KVADDR, "module struct", 1536, (ROE|Q), 842a5e0>
   <readmem: 4044b000, KVADDR, "pgd page", 32, (FOE), 845a308>
   <readmem: 6000, PHYSADDR, "pmd page", 4096, (FOE), 845b310>
   <readmem: 1d515000, PHYSADDR, "page table", 4096, (FOE), 845c318>

That readmem() appears to have worked, because it thinks it
successfully read the module struct at that address.  But when
it pulled out the address of the *next* module in the linked list,
it read this:

   module: fffffffc

And when it tried to read that bogus address, it failed, and
led to the WARNING message:

   <readmem: fffffffc, KVADDR, "module struct", 1536, (ROE|Q), 842a5e0>
   <readmem: 7000, PHYSADDR, "page table", 4096, (FOE), 845c318>

   crash: invalid kernel virtual address: fffffffc  type: "module struct"

   WARNING: cannot access vmalloc'd module memory
   ...

Although I cannot say for sure, I'm presuming that the initial
read of the module structure at f9088280 ended up reading from
the wrong location and therefore read garbage.  You can verify
that by bringing the a dumpfile session, and doing this:

   crash> module f9088280

It *should* display something that is recognizable as a module
structure.  For example:

   crash> mod | grep ext3
   f8899080  ext3                123593  (not loaded)  [CONFIG_KALLSYMS]
   crash> module f8899080
   struct module {
     state = MODULE_STATE_LIVE,
     list = {
       next = 0xf8854a84,
       prev = 0xf8876984
     },
     name = "ext3"
     mkobj = {
       kobj = {
         k_name = 0xf88990cc "ext3",
         name = "ext3",
         kref = {
           refcount = {
             counter = 2
           }
         },
     ...

Your attempt will probably show the fffffffc in the list_head
just after the "state" field at the top, as well as a bunch
of other garbage.

And as I suggested in my first reply, can you also verify that
user virtual address translations also fail?  I suggested pulling
a sample virtual address out of the current context's ("bash")
VM, but doing that may "look" like it's working, but it may
be doing it incorrectly.  So you also need to verify the data
that it finds there.  One way to do that is to read the beginning
of the /bin/bash text segment, and look for "ELF" string.

For example, here I'm in a "bash" context, similar to the
context that your dumpfile comes up in by default:

   crash> set
       PID: 19839
   COMMAND: "bash"
      TASK: f7b03000  [THREAD_INFO: def66000]
       CPU: 1
     STATE: TASK_INTERRUPTIBLE
   crash>

Dump the virtual memory regions, and find the first VMA
that is backed by "/bin/bash":

   crash> vm
   PID: 19839  TASK: f7b03000  CPU: 1   COMMAND: "bash"
      MM       PGD      RSS    TOTAL_VM
   f6dc5740  f745c9c0  1392k    4532k
     VMA       START      END    FLAGS  FILE
   f69019bc    6fa000    703000     75  /lib/libnss_files-2.5.so
   f69013e4    703000    704000 100071  /lib/libnss_files-2.5.so
   f6901d84    704000    705000 100073  /lib/libnss_files-2.5.so
   f6901284    a7c000    a96000    875  /lib/ld-2.5.so
   f6901b74    a96000    a97000 100871  /lib/ld-2.5.so
   f6901b1c    a97000    a98000 100873  /lib/ld-2.5.so
   f69012dc    a9a000    bd7000     75  /lib/libc-2.5.so
   f690185c    bd7000    bd9000 100071  /lib/libc-2.5.so
   f6901ac4    bd9000    bda000 100073  /lib/libc-2.5.so
   f69017ac    bda000    bdd000 100073
   f6901e8c    bdf000    be1000     75  /lib/libdl-2.5.so
   f6901a6c    be1000    be2000 100071  /lib/libdl-2.5.so
   f6901754    be2000    be3000 100073  /lib/libdl-2.5.so
   f6901f94    c89000    c8c000     75  /lib/libtermcap.so.2.0.8
   f69016fc    c8c000    c8d000 100073  /lib/libtermcap.so.2.0.8
   f6901d2c    fd1000    fd2000 8000075
   f6901124   8047000   80f5000   1875  /bin/bash
   f69018b4   80f5000   80fa000 101873  /bin/bash
   f6901964   80fa000   80ff000 100073
   f690122c   9a75000   9a96000 100073
   f680890c  b7d7f000  b7f7f000     71  /usr/lib/locale/locale-archive
   f6901f3c  b7f7f000  b7f81000 100073
   f68cfb74  b7f82000  b7f84000 100073
   f6dd69bc  b7f84000  b7f8b000     d1  /usr/lib/gconv/gconv-modules.cache
   f69014ec  bf86e000  bf884000 100173
   crash>

You can see above, that in my case the text region starts at
user virtual address 8047000.  That actually points to the
ELF header at the beginning of the "/bin/bash" file, which
starts with a 0x7f followed by the ascii "ELF" characters:

   crash> rd 8047000
   8047000:  464c457f                              .ELF
   crash>

You might want to use "rd -u <address>" to ensure that
crash will presume that the address is a user address,
just in case that's an issue with your setup.

Anyway, try the above, and also dump out the and save
the output of these debug commands:

   crash> help -m > help.k
   crash> help -k > help.m
   crash> help -v > help.v

But again, given that you seem to be saying that everything
works just fine on the live system, the debugging of this
issue will most likely end up requiring that you determine
where exactly things "go wrong" with the dumpfile in comparison
to the same things working correctly on the live system.

Thanks,
   Dave

--
Crash-utility mailing list
Crash-utility <at> redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
Attachment (help.v): application/octet-stream, 2183 bytes
Attachment (help.k): application/octet-stream, 1611 bytes
Attachment (help.m): application/octet-stream, 2586 bytes
Yep, I can run mod commands on a live system just fine.

Looks like "next" doesn't point to fffffffc...

crash> module f9088280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0x0,
    prev = 0x0
  },
  name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000",
  mkobj = {
    kobj = {
      k_name = 0x0,
      name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000",
      kref = {
        refcount = {
          counter = 0
        }
      },
      entry = {
        next = 0x0,
        prev = 0x0
      },
      parent = 0x0,
      kset = 0x0,
      ktype = 0x0,
      dentry = 0x0,
      poll = {
        lock = {
          raw_lock = {
            slock = 0
          }
        },
        task_list = {
          next = 0x0,
          prev = 0x0
        }
      }
    },
    mod = 0x0
  },
...and all the rest of the struct is zeros too...

Does the following mean that user virtual address translations are failing too?

crash> set
    PID: 4304
COMMAND: "bash"
   TASK: 5d7e9030  [THREAD_INFO: f4b70000]
    CPU: 0
  STATE: TASK_RUNNING (SYSRQ)
crash> vm
PID: 4304   TASK: 5d7e9030  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
f7e7f040  5d5002c0  2616k    3972k
  VMA       START      END    FLAGS  FILE
5fe454ec   8048000   80ee000   1875  /bin/bash
5fe45e34   80ee000   80f3000 101877  /bin/bash
...

crash> rd 8048000
rd: invalid kernel virtual address: 8048000  type: "32-bit KVADDR"
crash> rd -u 8048000
rd: invalid user virtual address: 8048000  type: "32-bit UVADDR"
crash> rd 80ee000
rd: invalid kernel virtual address: 80ee000  type: "32-bit KVADDR"
crash> rd -u 80ee000
rd: invalid user virtual address: 80ee000  type: "32-bit UVADDR"

help.k, .v, .m files attached. Hopefully my results here are meaningful to you, because I don't know nearly
enough about the memory architecture to claim to have a clue about this.

-Kevin

-----Original Message-----
From: crash-utility-bounces <at> redhat.com [mailto:crash-utility-bounces <at> redhat.com] On Behalf Of
Dave Anderson
Sent: Thursday, October 02, 2008 7:49 AM
To: Discussion list for crash utility usage, maintenance and development
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Worth, Kevin wrote:
> Tried running crash on a running kernel... seems that 4.0-3.7 doesn't like my kernel. When I run crash
4.0-7.2 on a live system, it appears that it has no problems with vmalloc'd module memory.
>
> crash 4.0-3.7
> ...
> GNU gdb 6.1
> ...
> This GDB was configured as "i686-pc-linux-gnu"...
>
> crash: /boot/System.map-2.6.20-17.39-custom2 and /dev/mem do not match!
>
> Usage:
>   crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]
>
> Enter "crash -h" for details.
>
>
> crash 4.0-7.2
> ...
> GNU gdb 6.1
> ...
> This GDB was configured as "i686-pc-linux-gnu"...
>
>       KERNEL: vmlinux-2.6.20-17.39-custom2
>     DUMPFILE: /dev/mem
>         CPUS: 2
>         DATE: Wed Oct  1 16:31:39 2008
>       UPTIME: 04:57:53
> LOAD AVERAGE: 0.10, 0.09, 0.09
>        TASKS: 95
>     NODENAME: ProCurve-TMS-zl-Module
>      RELEASE: 2.6.20-17.39-custom2
>      VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
>      MACHINE: i686  (2200 Mhz)
>       MEMORY: 5 GB
>          PID: 15801
>      COMMAND: "crash"
>         TASK: 47bd6030  [THREAD_INFO: 4a8a8000]
>          CPU: 1
>        STATE: TASK_RUNNING (ACTIVE)
> crash>
>
> Since that seems ok (and I don't encounter the error) I'll run crash with -d7 on the dump file to hopefully
expose what is wrong with either the dump or with crash.
>
> I've attached the output of crash with -d7... not sure how the mailing like handles file attachments, but
if needed I can paste the text. (or if there is something specific I should look for let me know and I can paste
just that section).

Yeah, crash 4.0-3.7 is 2 years old, which is pretty ancient.
Plus I'm only interested in helping out with the latest version.

But according to the above, 4.0-7.2 works OK on the live system?
You can do a "mod" command and it works OK?

Sometimes on larger-memory systems, running live using /dev/mem,
you might see the "WARNING: cannot access vmalloc'd module"
message because the physical memory that is backing the
vmalloc'd virtual address is in highmem, and cannot be
accessed by /dev/mem.  In any case, it appears that the
module structures have all been read successfully on your
live system.

And that's kind of bothersome, because for all practical
purposes, the crash utility doesn't care where it's getting
the physical memory from (i.e., from /dev/mem or from the
dumpfile).  And if it works on the live system, it should
work with the dumpfile.

Anyway, looking at the crash.log, here's what's happening:

Everything was running fine until the module initialization
step.  The list of installed kernel modules is headed up
from the "modules" list_head symbol at 403c63a4, which
contains a pointer to the first module structure at
vmalloc address f9088280:

   ...
   <readmem: 403c63a4, KVADDR, "modules", 4, (FOE), 83ff8cc>
   please wait... (gathering module symbol data)
   module: f9088280

The readmem() of that first module -- and the very first vmalloc
address -- at f9088280 required a page table translation:

   <readmem: f9088280, KVADDR, "module struct", 1536, (ROE|Q), 842a5e0>
   <readmem: 4044b000, KVADDR, "pgd page", 32, (FOE), 845a308>
   <readmem: 6000, PHYSADDR, "pmd page", 4096, (FOE), 845b310>
   <readmem: 1d515000, PHYSADDR, "page table", 4096, (FOE), 845c318>

That readmem() appears to have worked, because it thinks it
successfully read the module struct at that address.  But when
it pulled out the address of the *next* module in the linked list,
it read this:

   module: fffffffc

And when it tried to read that bogus address, it failed, and
led to the WARNING message:

   <readmem: fffffffc, KVADDR, "module struct", 1536, (ROE|Q), 842a5e0>
   <readmem: 7000, PHYSADDR, "page table", 4096, (FOE), 845c318>

   crash: invalid kernel virtual address: fffffffc  type: "module struct"

   WARNING: cannot access vmalloc'd module memory
   ...

Although I cannot say for sure, I'm presuming that the initial
read of the module structure at f9088280 ended up reading from
the wrong location and therefore read garbage.  You can verify
that by bringing the a dumpfile session, and doing this:

   crash> module f9088280

It *should* display something that is recognizable as a module
structure.  For example:

   crash> mod | grep ext3
   f8899080  ext3                123593  (not loaded)  [CONFIG_KALLSYMS]
   crash> module f8899080
   struct module {
     state = MODULE_STATE_LIVE,
     list = {
       next = 0xf8854a84,
       prev = 0xf8876984
     },
     name = "ext3"
     mkobj = {
       kobj = {
         k_name = 0xf88990cc "ext3",
         name = "ext3",
         kref = {
           refcount = {
             counter = 2
           }
         },
     ...

Your attempt will probably show the fffffffc in the list_head
just after the "state" field at the top, as well as a bunch
of other garbage.

And as I suggested in my first reply, can you also verify that
user virtual address translations also fail?  I suggested pulling
a sample virtual address out of the current context's ("bash")
VM, but doing that may "look" like it's working, but it may
be doing it incorrectly.  So you also need to verify the data
that it finds there.  One way to do that is to read the beginning
of the /bin/bash text segment, and look for "ELF" string.

For example, here I'm in a "bash" context, similar to the
context that your dumpfile comes up in by default:

   crash> set
       PID: 19839
   COMMAND: "bash"
      TASK: f7b03000  [THREAD_INFO: def66000]
       CPU: 1
     STATE: TASK_INTERRUPTIBLE
   crash>

Dump the virtual memory regions, and find the first VMA
that is backed by "/bin/bash":

   crash> vm
   PID: 19839  TASK: f7b03000  CPU: 1   COMMAND: "bash"
      MM       PGD      RSS    TOTAL_VM
   f6dc5740  f745c9c0  1392k    4532k
     VMA       START      END    FLAGS  FILE
   f69019bc    6fa000    703000     75  /lib/libnss_files-2.5.so
   f69013e4    703000    704000 100071  /lib/libnss_files-2.5.so
   f6901d84    704000    705000 100073  /lib/libnss_files-2.5.so
   f6901284    a7c000    a96000    875  /lib/ld-2.5.so
   f6901b74    a96000    a97000 100871  /lib/ld-2.5.so
   f6901b1c    a97000    a98000 100873  /lib/ld-2.5.so
   f69012dc    a9a000    bd7000     75  /lib/libc-2.5.so
   f690185c    bd7000    bd9000 100071  /lib/libc-2.5.so
   f6901ac4    bd9000    bda000 100073  /lib/libc-2.5.so
   f69017ac    bda000    bdd000 100073
   f6901e8c    bdf000    be1000     75  /lib/libdl-2.5.so
   f6901a6c    be1000    be2000 100071  /lib/libdl-2.5.so
   f6901754    be2000    be3000 100073  /lib/libdl-2.5.so
   f6901f94    c89000    c8c000     75  /lib/libtermcap.so.2.0.8
   f69016fc    c8c000    c8d000 100073  /lib/libtermcap.so.2.0.8
   f6901d2c    fd1000    fd2000 8000075
   f6901124   8047000   80f5000   1875  /bin/bash
   f69018b4   80f5000   80fa000 101873  /bin/bash
   f6901964   80fa000   80ff000 100073
   f690122c   9a75000   9a96000 100073
   f680890c  b7d7f000  b7f7f000     71  /usr/lib/locale/locale-archive
   f6901f3c  b7f7f000  b7f81000 100073
   f68cfb74  b7f82000  b7f84000 100073
   f6dd69bc  b7f84000  b7f8b000     d1  /usr/lib/gconv/gconv-modules.cache
   f69014ec  bf86e000  bf884000 100173
   crash>

You can see above, that in my case the text region starts at
user virtual address 8047000.  That actually points to the
ELF header at the beginning of the "/bin/bash" file, which
starts with a 0x7f followed by the ascii "ELF" characters:

   crash> rd 8047000
   8047000:  464c457f                              .ELF
   crash>

You might want to use "rd -u <address>" to ensure that
crash will presume that the address is a user address,
just in case that's an issue with your setup.

Anyway, try the above, and also dump out the and save
the output of these debug commands:

   crash> help -m > help.k
   crash> help -k > help.m
   crash> help -v > help.v

But again, given that you seem to be saying that everything
works just fine on the live system, the debugging of this
issue will most likely end up requiring that you determine
where exactly things "go wrong" with the dumpfile in comparison
to the same things working correctly on the live system.

Thanks,
   Dave

--
Crash-utility mailing list
Crash-utility <at> redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
Itsuro ODA | 3 Oct 2008 08:02
Picon
Favicon

Re: crash can't analyze memory dumpfile of Xen

Hi,

I found the root cause of this problem is that the value of "PERCPU_SHIFT"
was changed to 13 from 12.

The quick workaround is to apply the following patch to the crash command:
----------------------------------------------------------------------
--- xen_hyper_defs.h.org        2008-10-03 14:46:28.000000000 +0900
+++ xen_hyper_defs.h    2008-10-03 14:46:50.000000000 +0900
 <at>  <at>  -134,7 +134,7  <at>  <at> 
 #endif

 #if defined(X86) || defined(X86_64)
-#define XEN_HYPER_PERCPU_SHIFT 12
+#define XEN_HYPER_PERCPU_SHIFT 13
 #define xen_hyper_per_cpu(var, cpu)  \
        ((ulong)(var) + (((ulong)(cpu))<<XEN_HYPER_PERCPU_SHIFT))
 #elif defined(IA64)
------------------------------------------------------------------------

I need to think the backword compatibility. I wonder how to determine
the value of "PERCPU_SHIFT". The change of "PERCPU_SHIFT" was made at
a certain point of xen-unstable before xen-3.3 release. The xen version
number (3.3) can't use as key... I will consider more...

Thanks.
Itsuro Oda

On Fri, 05 Sep 2008 13:42:46 +0900
Itsuro ODA <oda <at> valinux.co.jp> wrote:

> Hi,
> 
> I recieved the dump file via FTP from Yuji and I can reproduced
> the problem.
> 
> Hmm, it seems the format of the crash note section is not expected.
> (and there is another problem; "jiffies" is lost.)
> I will check more deeply.
> 
> Until the problem is fixed, try the following quick hack.
> ------------------------------------------------------------------
> --- xen_hyper.c.org	2008-09-05 12:48:57.000000000 +0900
> +++ xen_hyper.c	2008-09-05 13:32:28.000000000 +0900
>  <at>  <at>  -150,7 +150,7  <at>  <at> 
>  	 * Do some initialization.
>  	 */
>  #ifndef IA64
> -	xen_hyper_dumpinfo_init();
> +//	xen_hyper_dumpinfo_init(); /* XXX: should be fixed !! */
>  #endif
>  	xhmachdep->pcpu_init();
>  	xen_hyper_domain_init();
>  <at>  <at>  -1746,9 +1746,11  <at>  <at> 
>  			tmp2 = (ulong)jiffies_64;
>  			jiffies_64 = (ulonglong)(tmp2 - tmp1);
>  		}
> -	} else {
> +	} else if (symbol_exists("jiffies")) {
>  		get_symbol_data("jiffies", sizeof(long), &jiffies);
>  		jiffies_64 = (ulonglong)jiffies;
> +	} else {
> +		jiffies_64 = 0; /* XXX: find alternative !! */
>  	}
>  
>  	return jiffies_64;
> ------------------------------------------------------------------
> (the "dumpinfo" sub command cannot be used.)
> 
> Thanks.
> Itsuro Oda
> 
> On Thu, 04 Sep 2008 16:29:51 +0900
> Yuji Shimada <shimada-yxb <at> necst.nec.co.jp> wrote:
> 
> > Hi ODA-san,
> > 
> > Thank you so much for your reply.
> > 
> > > What arch did you use ?
> > 
> > I used x86_64 arch.
> > 
> > > If you send me xen-syms-3.3-unstable and dumpfile.core
> > > I will investigate more.
> > 
> > Please find "xen-syms-3.3-unstable", attached with the mail.
> > If you need "dumpfile.core" to investigate this issue, please let me know.
> > 
> > In such case, I think I should send you "dumpfile.core" stored on
> > the disk by post, or upload it to your site. Because it is a huge size.
> > 
> > Thanks,
> > 
> > --
> > Yuji Shimada
> 
> -- 
> Itsuro ODA <oda <at> valinux.co.jp>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel <at> lists.xensource.com
> http://lists.xensource.com/xen-devel

--

-- 
Itsuro ODA <oda <at> valinux.co.jp>

Gmane