K.Prasad | 3 Oct 2011 09:32
Picon

[Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

There are certain types of crashes induced by faulty hardware in which
capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
dangerous).

A case in point, is unrecoverable memory errors (resulting in fatal machine
check exceptions) in which reading from the faulty memory location from the
kexec'ed kernel will cause double fault and system reset (leaving no
information for the user).

This patch introduces a framework called 'slimdump' enabled through a new
elf-note NT_NOCOREDUMP. Any error whose cause cannot be attributed to a
software error and cannot be detected by analysing the kernel memory may
decide to add this elf-note to the vmcore and indicate the futility of
such an exercise. Tools such as 'kexec', 'makedumpfile' and 'crash' are
also modified in tandem to recognise this new elf-note and capture
'slimdump'.

The physical address and size of the NT_NOCOREDUMP are made available to the
user-space through a "/sys/kernel/nt_nocoredump" sysfs file (just like other
kexec related files).

Signed-off-by: K.Prasad <prasad@...>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   28 ++++++++++++++++++++++++++++
 include/linux/elf.h              |   18 ++++++++++++++++++
 include/linux/kexec.h            |    1 +
 kernel/kexec.c                   |   11 +++++++++++
 kernel/ksysfs.c                  |   10 ++++++++++
 5 files changed, 68 insertions(+), 0 deletions(-)

(Continue reading)

K.Prasad | 3 Oct 2011 09:35
Picon

[Patch 2/4][kexec-tools] Recognise NT_NOCOREDUMP elf-note type

The kernel vmcore may contain a new elf-note of type NT_NOCOREDUMP. Include this
new note, whose address and length are made available at
/sys/kernel/nt_nocoredump, while loading elf-headers.

Signed-off-by: K.Prasad <prasad@...>
---

diff --git a/kexec/crashdump-elf.c b/kexec/crashdump-elf.c
index 8d82db9..b009227 100644
--- a/kexec/crashdump-elf.c
+++ b/kexec/crashdump-elf.c
 <at>  <at>  -39,7 +39,9  <at>  <at>  int FUNC(struct kexec_info *info,
 	long int nr_cpus = 0;
 	uint64_t notes_addr, notes_len;
 	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
+	uint64_t nt_nocoredump_addr, nt_nocoredump_len;
 	int has_vmcoreinfo = 0;
+	int has_nt_nocoredump = 0;
 	uint64_t vmcoreinfo_addr_xen, vmcoreinfo_len_xen;
 	int has_vmcoreinfo_xen = 0;
 	int (*get_note_info)(int cpu, uint64_t *addr, uint64_t *len);
 <at>  <at>  -57,6 +59,9  <at>  <at>  int FUNC(struct kexec_info *info,
 		has_vmcoreinfo = 1;
 	}

+	if (get_kernel_nt_nocoredump(&nt_nocoredump_addr, &nt_nocoredump_len) == 0)
+		has_nt_nocoredump = 1;
+
 	if (xen_present() &&
 	    get_xen_vmcoreinfo(&vmcoreinfo_addr_xen, &vmcoreinfo_len_xen) == 0) {
(Continue reading)

Suzuki K. Poulose | 3 Oct 2011 12:17
Picon

[PATCH] Series short description

The following series implements...

---

Suzuki K. Poulose (1):
      kexec: powerpc: crash_dump: No backup region for PPC BookE

 configure.ac                       |    5 +++++
 kexec/arch/ppc/crashdump-powerpc.c |    7 ++++++-
 kexec/arch/ppc/crashdump-powerpc.h |    8 ++++++++
 kexec/arch/ppc/kexec-ppc.c         |    5 +++++
 purgatory/arch/ppc/purgatory-ppc.c |    2 ++
 5 files changed, 26 insertions(+), 1 deletions(-)

--

-- 
Signature
Suzuki K. Poulose | 3 Oct 2011 12:18
Picon

[PATCH] kexec: powerpc: crash_dump: No backup region for PPC BookE

Disable backup regions for BookE in case of a CRASH Dump, as they can
be run from anywhere.

The patch introduces --with-booke option to support the BookE.

With the patch, we get :

## On a 256M machine:

# busybox cat /proc/cmdline
init=/bin/init console=ttyS0,16550 crashkernel=128M <at> 100M
# kexec -p root/vmlinux
usable memory rgns size:1 base:6400000 size:8000000
CRASH MEMORY RANGES
0000000000000000-0000000006400000
000000000e400000-0000000010000000
Command line after adding elfcorehdr:  elfcorehdr=112380K
Command line after adding elfcorehdr:  elfcorehdr=112380K savemaxmem=256M

Signed-off-by: Suzuki K. Poulose<suzuki@...>
---

 configure.ac                       |    5 +++++
 kexec/arch/ppc/crashdump-powerpc.c |    7 ++++++-
 kexec/arch/ppc/crashdump-powerpc.h |    8 ++++++++
 kexec/arch/ppc/kexec-ppc.c         |    5 +++++
 purgatory/arch/ppc/purgatory-ppc.c |    2 ++
 5 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/configure.ac b/configure.ac
(Continue reading)

K.Prasad | 3 Oct 2011 14:03
Picon

Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

On Mon, Oct 03, 2011 at 03:10:43AM -0700, Eric W. Biederman wrote:
> "K.Prasad" <prasad@...> writes:
> 
> > There are certain types of crashes induced by faulty hardware in which
> > capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
> > dangerous).
> >
> > A case in point, is unrecoverable memory errors (resulting in fatal machine
> > check exceptions) in which reading from the faulty memory location from the
> > kexec'ed kernel will cause double fault and system reset (leaving no
> > information for the user).
> 
> It does make plenty of sense, and I capture the all of the time.
> It totally doesn't make sense to do this in the kernel when we can
> filter this from userspace just fine.
> 

It's interesting...according to Intel's Software Developer Manual
(quoting from Volume 3A, Chapter 15), the MCIP bit in IA32_MCG_STATUS
MSR behaves as described below.

"MCIP (machine check in progress) flag, bit 2 Indicates (when set)
that a machine-check exception was generated. Software can set or clear this
flag. The occurrence of a second Machine-Check Event while MCIP is set will
cause the processor to enter a shutdown state."

While in do_machine_check function, we enter the panic path (for
unrecoverable errors) much before the IA32_MCG_STATUS MSR is reset and
this is likely to dangerous.

(Continue reading)

McClintock Matthew-B29882 | 3 Oct 2011 20:23
Favicon

Re: [PATCH] kexec: powerpc: crash_dump: No backup region for PPC BookE

On Mon, Oct 3, 2011 at 5:18 AM, Suzuki K. Poulose <suzuki@...> wrote:
> Disable backup regions for BookE in case of a CRASH Dump, as they can
> be run from anywhere.
>
> The patch introduces --with-booke option to support the BookE.
>
> With the patch, we get :
>
> ## On a 256M machine:
>
> # busybox cat /proc/cmdline
> init=/bin/init console=ttyS0,16550 crashkernel=128M <at> 100M
> # kexec -p root/vmlinux
> usable memory rgns size:1 base:6400000 size:8000000
> CRASH MEMORY RANGES
> 0000000000000000-0000000006400000
> 000000000e400000-0000000010000000
> Command line after adding elfcorehdr:  elfcorehdr=112380K
> Command line after adding elfcorehdr:  elfcorehdr=112380K savemaxmem=256M

So there were two crash regions when we only needed the one specified
on the command line?

-M
Suzuki Poulose | 4 Oct 2011 05:21
Picon

Re: [PATCH] kexec: powerpc: crash_dump: No backup region for PPC BookE

On 10/03/11 23:53, McClintock Matthew-B29882 wrote:
> On Mon, Oct 3, 2011 at 5:18 AM, Suzuki K. Poulose<suzuki@...>  wrote:
>> Disable backup regions for BookE in case of a CRASH Dump, as they can
>> be run from anywhere.
>>
>> The patch introduces --with-booke option to support the BookE.
>>
>> With the patch, we get :
>>
>> ## On a 256M machine:
>>
>> # busybox cat /proc/cmdline
>> init=/bin/init console=ttyS0,16550 crashkernel=128M <at> 100M
>> # kexec -p root/vmlinux
>> usable memory rgns size:1 base:6400000 size:8000000
>> CRASH MEMORY RANGES
>> 0000000000000000-0000000006400000
>> 000000000e400000-0000000010000000
>> Command line after adding elfcorehdr:  elfcorehdr=112380K
>> Command line after adding elfcorehdr:  elfcorehdr=112380K savemaxmem=256M
>
> So there were two crash regions when we only needed the one specified
> on the command line?
>
Yes, and isn't that is expected ? The first kernel uses the memory region
(0-100M), (228M,256M). Where the 100-228M is reserved for crash kernel.

Thanks
Suzuki
(Continue reading)


Gmane