Namhyung Kim | 29 Jun 12:56 2016
Gravatar

[RFC/PATCH v3] ftrace: Reduce size of function graph entries

Currently ftrace_graph_ent{,_entry} and ftrace_graph_ret{,_entry} struct
can have padding bytes at the end due to alignment in 64-bit data type.
As these data are recorded so frequently, those paddings waste
non-negligible space.  As the ring buffer maintains alignment properly
for each architecture, just to remove the extra padding using 'packed'
attribute.

  ftrace_graph_ent_entry:  24 -> 20
  ftrace_graph_ret_entry:  48 -> 44

Also I moved the 'overrun' field in struct ftrace_graph_ret to minimize
the padding in the middle.

Tested on x86_64 only.

Cc: linux-arch <at> vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung <at> kernel.org>
---
 include/linux/ftrace.h       | 12 ++++++++----
 kernel/trace/trace.h         | 11 +++++++++++
 kernel/trace/trace_entries.h |  4 ++--
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dea12a6e413b..68f01226b7ca 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
 <at>  <at>  -753,23 +753,27  <at>  <at>  static inline void ftrace_init(void) { }

 /*
(Continue reading)

Namhyung Kim | 28 Jun 07:30 2016
Gravatar

[RFC/PATCH v2] ftrace: Reduce size of function graph entries

Currently ftrace_graph_ent{,_entry} and ftrace_graph_ret{,_entry} struct
can have padding bytes at the end due to alignment in 64-bit data type.
As these data are recorded so frequently, those paddings waste
non-negligible space.  As some archs can have efficient unaligned
accesses, reducing the alignment can save ~10% of data size:

  ftrace_graph_ent_entry:  24 -> 20
  ftrace_graph_ret_entry:  48 -> 44

Also I moved the 'overrun' field in struct ftrace_graph_ret to minimize
the padding.  I think the FTRACE_ALIGNMENT still needs to have proper
alignment (even if ring buffer handles the alignment after all) since
the ftrace_graph_ent/ret struct is located on stack before copying to
the ring buffer.

Tested on x86_64 only.

Cc: linux-arch <at> vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung <at> kernel.org>
---
 include/linux/ftrace.h       | 16 ++++++++++++----
 kernel/trace/trace.h         | 11 +++++++++++
 kernel/trace/trace_entries.h |  4 ++--
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dea12a6e413b..a86cdf167419 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
 <at>  <at>  -751,25 +751,33  <at>  <at>  extern void ftrace_init(void);
(Continue reading)

James Hogan | 25 Jun 00:42 2016
James Hogan <james.hogan <at> imgtec.com>

Subject: [PATCH v4 0/2] kbuild: Remove stale asm-generic wrappers

[PATCH v4 0/2] kbuild: Remove stale asm-generic wrappers

This patchset attempts to fix kbuild to automatically remove stale
asm-generic wrappers, i.e. when files are removed from generic-y and
added directly into arch/*/include/uapi/asm/, but where the existing
wrapper in arch/*/include/generated/asm/ continues to be used.

MIPS was recently burned by this in v4.3 (see patch 2), with continuing
reports of build failures when people upgrade their trees, which go away
after arch/mips/include/generated is removed (or reportedly make
mrproper/distclean). It is particularly irritating during bisection.

Since v2 I've seen other cases of this breaking MIPS build, and testing
on x86_64, starting a build first on v4.0 and then on mainline with this
patchset shows one stale generated header:
  REMOVE  arch/x86/include/generated/asm/scatterlist.h

Changes in v4:
- None (resend on Thomas Gleixner's request).

Changes in v3:
- Ensure FORCE actually gets marked .PHONY.

Changes in v2:
- New patch 1 to add tracking of generated headers that aren't generic-y
  wrappers, via generated-y, particularly for x86 (thanks to kbuild test
  robot).
- Rewrite a bit, drawing inspiration from Makefile.headersinst.
- Exclude genhdr-y and generated-y (thanks to kbuild test robot).

James Hogan (2):
  kbuild, x86: Track generated headers with generated-y
(Continue reading)

Dmitry Vyukov | 24 Jun 17:39 2016
Picon

[PATCH v3] vmlinux.lds: account for destructor sections

If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used,
gcc emits .ctors/.dtors and .text.startup/.text.exit
sections instead of .init_array/.fini_array.
.dtors section is not explicitly accounted in the linker
script and messes vvar/percpu layout. Want:

ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page

Got:

ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page

This happens because __vvar_page and .vvar get different
addresses in arch/x86/kernel/vmlinux.lds.S:

(Continue reading)

Dmitry Vyukov | 23 Jun 18:06 2016
Picon

[PATCH v2] kasan: account for destructor sections

If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array, gcc emits .ctors/.dtors and
.text.startup/.text.exit sections instead of
.init_array/.fini_array.
.dtors section is not explicitly accounted in the linker
script and messes vvar/percpu layout. Want:

ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page

Got:

ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page

This happens because __vvar_page and .vvar get different
addresses in arch/x86/kernel/vmlinux.lds.S:

(Continue reading)

Dmitry Vyukov | 22 Jun 19:07 2016
Picon

[PATCH] kasan: account for new sections when instrumenting globals

When I build kernel with CONFIG_KASAN and gcc6 (which instruments globals
and inserts global constructors and destructors), vmlinux contains some
new unaccounted sections: .text.exit .text.startup .dtors.
This messes vvar/percpu layout. Want:

ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page

Got:

ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page

If my reading of the linker script is correct, this happens because
__vvar_page and .vvar get different addresses here:
//arch/x86/kernel/vmlinux.lds.S

	. = ALIGN(PAGE_SIZE);
(Continue reading)

Pan Xinhui | 20 Jun 08:20 2016
Picon

[PATCH v3] locking/qrwlock: Let qrwlock has same layout regardless of the endian

This patch aims to get rid of endianness in queued_write_unlock(). We
want to set  __qrwlock->wmode to NULL, however the address is not
&lock->cnts in big endian machine. That causes queued_write_unlock()
write NULL to the wrong field of __qrwlock.

Actually qrwlock can have same layout, IOW we can remove the #if
__little_endian in struct __qrwlock. With such modification, we only
need define some _QW* and _QR* with corresponding values in different
endian systems.

Suggested-by: Will Deacon <will.deacon <at> arm.com>
Signed-off-by: Pan Xinhui <xinhui.pan <at> linux.vnet.ibm.com>
Acked-by: Waiman Long <Waiman.Long <at> hpe.com>
---
change from v2:
	change macro's name, add comments.
change from v1:
	A typo fix which is really bad...
	thanks Will for the carefull review. :)
---
 include/asm-generic/qrwlock.h | 15 +++++++++++----
 kernel/locking/qrwlock.c      | 11 +++++------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 54a8e65..28fb94a 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
 <at>  <at>  -27,11 +27,18  <at>  <at> 
 /*
(Continue reading)

Andy Lutomirski | 17 Jun 22:00 2016

[PATCH v2 00/13] Virtually mapped stacks with guard pages (x86, core)

Since the dawn of time, a kernel stack overflow has been a real PITA
to debug, has caused nondeterministic crashes some time after the
actual overflow, and has generally been easy to exploit for root.

With this series, arches can enable HAVE_ARCH_VMAP_STACK.  Arches
that enable it (just x86 for now) get virtually mapped stacks with
guard pages.  This causes reliable faults when the stack overflows.

If the arch implements it well, we get a nice OOPS on stack overflow
(as opposed to panicing directly or otherwise exploding badly).  On
x86, the OOPS is nice, has a usable call trace, and the overflowing
task is killed cleanly.

On my laptop, this adds about 1.5┬Ás of overhead to task creation,
which seems to be mainly caused by vmalloc inefficiently allocating
individual pages even when a higher-order page is available on the
freelist.

This does not address interrupt stacks.  It also does not address
the possibility of privilege escalation by a controlled stack
overflow that overwrites thread_info without hitting the guard page.
I'll send patches to address the latter issue once this series
lands.

It's worth noting that s390 has an arch-specific gcc feature that
detects stack overflows by adjusting function prologues.  Arches
with features like that may wish to avoid using vmapped stacks to
minimize the performance hit.

Ingo, once this gets a bit more review, would it make sense to
(Continue reading)

Paul E. McKenney | 16 Jun 01:08 2016
Picon

[PATCH Documentation/memory-barriers.txt] Clarify limited control-dependency scope

Nothing in the control-dependencies section of memory-barriers.txt
says that control dependencies don't extend beyond the end of the
if-statement containing the control dependency.  Worse yet, in many
situations, they do extend beyond that if-statement.  In particular,
the compiler cannot destroy the control dependency given proper use of
READ_ONCE() and WRITE_ONCE().  However, a weakly ordered system having
a conditional-move instruction provides the control-dependency guarantee
only to code within the scope of the if-statement itself.

This commit therefore adds words and an example demonstrating this
limitation of control dependencies.

Reported-by: Will Deacon <will.deacon <at> arm.com>
Signed-off-by: Paul E. McKenney <paulmck <at> linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz <at> infradead.org>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 147ae8ec836f..a4d0a99de04d 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
 <at>  <at>  -806,6 +806,41  <at>  <at>  out-guess your code.  More generally, although READ_ONCE() does force
 the compiler to actually emit code for a given load, it does not force
 the compiler to use the results.

+In addition, control dependencies apply only to the then-clause and
+else-clause of the if-statement in question.  In particular, it does
+not necessarily apply to code following the if-statement:
+
+	q = READ_ONCE(a);
+	if (q) {
(Continue reading)

Ley Foon Tan | 15 Jun 08:49 2016

[PATCH v3] ARM: socfpga: add PCIe to socfpga_defconfig

Enable Altera PCIe host driver, Altera MSI driver and PCIe devices.

CONFIG_PCI=y
CONFIG_PCI_MSI=y
CONFIG_PCIE_ALTERA=y
CONFIG_PCIE_ALTERA_MSI=y
CONFIG_BLK_DEV_NVME=m
CONFIG_E1000E=m
CONFIG_IGB=m
CONFIG_IXGBE=m

Signed-off-by: Ley Foon Tan <lftan <at> altera.com>
Signed-off-by: Tien Hock Loh <thloh <at> altera.com>
Cc: Dinh Nguyen <dinguyen <at> opensource.altera.com>

---
v2: update Dinh's email to opensource.altera.com
v3: update Dinh's email to  <at> kernel.org
---
 arch/arm/configs/socfpga_defconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/configs/socfpga_defconfig b/arch/arm/configs/socfpga_defconfig
index 753f1a5..419d758 100644
--- a/arch/arm/configs/socfpga_defconfig
+++ b/arch/arm/configs/socfpga_defconfig
 <at>  <at>  -19,6 +19,10  <at>  <at>  CONFIG_MODULE_UNLOAD=y
 # CONFIG_IOSCHED_CFQ is not set
 CONFIG_ARCH_SOCFPGA=y
 CONFIG_ARM_THUMBEE=y
(Continue reading)

H. Peter Anvin | 15 Jun 01:53 2016

cmpxchg and x86 flags output

The x86 gcc now has the ability to return the value of flags output.  In
most use cases, this has been trivial to use in the kernel.

However, cmpxchg() presents a problem.  The current definition of
cmpxchg() and its variants is:

	out = cmpxchg(ptr, old, new);

... which is then frequently followed by:

	if (likely(old == out))

... or something along those lines.

This test is unnecessary and can now be elided, but this means changing
the signature on the cmpxchg() function (macro, generally).

It seems to me that the sanest way to handle this is to add a new
interface with a fourth parameter, so:

	changed = cmpxchgx(ptr, old, new, out);

A generic implementation of cmpxchgx() would be provided, looking like:

#define cmpxchgx(ptr, old, new, out) ({			\
	__typeof__((*(ptr))) __old = (old);		\
	__typeof__((*(ptr))) __new = (new);		\
	__typeof__((*(ptr))) __old = (old);		\
	__typeof__((*(ptr))) __out; 			\
	(out) = __out = cmpxchg(ptr, __old, __new);	\
(Continue reading)


Gmane