Jan Kara | 1 Feb 16:19
Picon

Memory corruption due to word sharing

  Hello,

  we've spotted the following mismatch between what kernel folks expect
from a compiler and what GCC really does, resulting in memory corruption on
some architectures. Consider the following structure:
struct x {
    long a;
    unsigned int b1;
    unsigned int b2:1;
};

We have two processes P1 and P2 where P1 updates field b1 and P2 updates
bitfield b2. The code GCC generates for b2 = 1 e.g. on ia64 is:
   0:   09 00 21 40 00 21       [MMI]       adds r32=8,r32
   6:   00 00 00 02 00 e0                   nop.m 0x0
   c:   11 00 00 90                         mov r15=1;;
  10:   0b 70 00 40 18 10       [MMI]       ld8 r14=[r32];;
  16:   00 00 00 02 00 c0                   nop.m 0x0
  1c:   f1 70 c0 47                         dep r14=r15,r14,32,1;;
  20:   11 00 38 40 98 11       [MIB]       st8 [r32]=r14
  26:   00 00 00 02 00 80                   nop.i 0x0
  2c:   08 00 84 00                         br.ret.sptk.many b0;;

Note that gcc used 64-bit read-modify-write cycle to update b2. Thus if P1
races with P2, update of b1 can get lost. BTW: I've just checked on x86_64
and there GCC uses 8-bit bitop to modify the bitfield.

We actually spotted this race in practice in btrfs on structure
fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got
corrupted due to update of following bitfield and there seem to be other
(Continue reading)

Colin Walters | 1 Feb 17:37
Gravatar

Re: Memory corruption due to word sharing

On Wed, 2012-02-01 at 16:19 +0100, Jan Kara wrote:

> I've raised the issue with our GCC guys and they said to me that: "C does
> not provide such guarantee, nor can you reliably lock different
> structure fields with different locks if they share naturally aligned
> word-size memory regions.  The C++11 memory model would guarantee this,
> but that's not implemented nor do you build the kernel with a C++11
> compiler."

That's interesting.  So it seems like there are two solutions:

1) Use the same lock for a given bitfield
2) Split up the bitfield into different words

--
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linus Torvalds | 1 Feb 17:41
Gravatar

Re: Memory corruption due to word sharing

On Wed, Feb 1, 2012 at 7:19 AM, Jan Kara <jack <at> suse.cz> wrote:
>
>  we've spotted the following mismatch between what kernel folks expect
> from a compiler and what GCC really does, resulting in memory corruption on
> some architectures.

This is sad.

We've had something like this before due to architectural reasons
(alpha inability to do byte loads and stores leading us to not be able
to have items smaller than a byte sharing a word).

But at least then there was a *reason* for it, not "the compiler is
being difficult for no good reason".

Actually, the *sad* part is not that the compiler does something
unexpected - that's not new - but the usual "..and the gcc crowd
doesn't care, because they can point to paperwork saying it's not
defined". Even if that same paper is actually in the process of
getting updated exactly because it causes problems like ours.

That mentality is not new, of course.

> So it seems what C/GCC promises does not quite match with what kernel
> expects.

The paper C standard can *never* promise what a kernel expects. There
are tons of unspecified things that a compiler could do, including
moving memory around behind our back as long as it moves it back.
Because it's all "invisible" in the virtual C machine in the absence
(Continue reading)

Linus Torvalds | 1 Feb 17:56
Gravatar

Re: Memory corruption due to word sharing

On Wed, Feb 1, 2012 at 8:37 AM, Colin Walters <walters <at> verbum.org> wrote:
>
> 1) Use the same lock for a given bitfield

That's not the problem. All the *bitfield* fields are all accessed
under the same word already.

> 2) Split up the bitfield into different words

Again, it's not the bitfield that is the problem.

The problem is that the compiler - when it writes to the word that
contains the bitfield - will also corrupt the word *NEXT* to the
bitfield (or before - it probably depends on alignment).

So we have two separate 32-bit words - one of which just happens to
contain a bitfield. We write to these *separate* words using different
locking rules (btw, they don't even need to be protected by a lock: we
may have other rules that protects the individual word contents.

But the compiler turns the access to the bitfield (in a 32-bit aligned
word) into a 64-bit access that accesses the word *next* to it.

That word next to it might *be* the lock, for example.

So we could literally have this kind of situation:

   struct {
      atomic_t counter;
      unsigned int val:4, other:4, data:24;
(Continue reading)

Jiri Kosina | 1 Feb 18:11
Picon

Re: Memory corruption due to word sharing

On Wed, 1 Feb 2012, Linus Torvalds wrote:

> But the compiler turns the access to the bitfield (in a 32-bit aligned
> word) into a 64-bit access that accesses the word *next* to it.
> 
> That word next to it might *be* the lock, for example.
> 
> So we could literally have this kind of situation:
> 
>    struct {
>       atomic_t counter;
>       unsigned int val:4, other:4, data:24;
>    };
> 
> and if we write code like this:
> 
>     spin_lock(&somelock);
>     s->data++;
>     spin_unlock(&somelock);
> 
> and on another CPU we might do
> 
>    atomic_inc(&counter);
> 
> and the access to the bitfield will *corrupt* the atomic counter, even
> though both of them are perfectly fine!
> 
> Quite frankly, if the bug is simply because gcc doesn't actually know
> or care about the underlying size of the bitmask, it is possible that
> we can find a case where gcc clearly is buggy even according to the
(Continue reading)

Linus Torvalds | 1 Feb 18:29
Gravatar

Re: Memory corruption due to word sharing

On Wed, Feb 1, 2012 at 9:08 AM, Torvald Riegel <triegel <at> redhat.com> wrote:
>
> What do the kernel folks think about the C11 memory model?  If you can
> spot any issues in there, the GCC community would certainly like to
> know.

I don't think this is about memory models except very tangentially.

Gcc currently accesses fields in a way that affects *independent*
fields, without checking their memory models at all.

Even original C already has one memory model: "volatile" data is seen
outside the virtual C machine. And Jiri reports that even that
*original* memory model is being violated. We're taling about the one
from about 40 years ago.

So no amount of memory model will ever matter, if gcc can't even get
the 40-year old one right.

Also, the C11 memory model really isn't going to make any difference
to this. Again, because this happens even if we have explicit locking
for the different fields. So they aren't "atomic" or anything like
that, because we do our own (and sufficient) locking. But once the
compiler starts to randomly touch data that just happens to be
adjacent to other data, any memory model about those individual pieces
is going to be entirely immaterial anyway.

Finally: I do not believe for a second that we can actually use the
C11 memory model in the kernel and say "that's sufficient for our
needs". We will continue to have to do things that are "outside the
(Continue reading)

Linus Torvalds | 1 Feb 18:37
Gravatar

Re: Memory corruption due to word sharing

On Wed, Feb 1, 2012 at 9:11 AM, Jiri Kosina <jkosina <at> suse.cz> wrote:
> On Wed, 1 Feb 2012, Linus Torvalds wrote:
>>
>> And I suspect it really is a generic bug that can be shown even with
>> the above trivial example.
>
> I have actually tried exactly this earlier today (because while looking at
> this, I had an idea that putting volatile in place could be a workaround,
> causing gcc to generate a saner code), but it doesn't work either:
>
> # cat x.c
> struct x {
>    long a;
>    volatile unsigned int lock;
>    unsigned int full:1;
> };
>
> void
> wrong(struct x *ptr)
> {
>        ptr->full = 1;
> }
>
> int main()
> {
>        wrong(0);
> }
> # gcc -O2 x.c
> # gdb -q ./a.out
> Reading symbols from /root/a.out...done.
(Continue reading)

Michael Matz | 1 Feb 18:41
Picon

Re: Memory corruption due to word sharing

Hi,

On Wed, 1 Feb 2012, Jiri Kosina wrote:

> # cat x.c
> struct x {
>     long a;
>     volatile unsigned int lock;
>     unsigned int full:1;
> };
> 
> void
> wrong(struct x *ptr)
> {
>         ptr->full = 1;
> }
> 
> In my opinion, this is a clear bug

Even that depends (sadly) on who you ask.  half-volatile objects (i.e. 
struct where some members are volatile and others aren't) are terribly 
underspecified.  You can make the bitfield volatile, and for ia64 that 
would at least result of ld8.acq/st8.rel pairs.

And Linus: don't be so hastily dismissive.  People (even the compiler 
ones) do agree that using an 8 byte access for a 4 byte entity is 
problematic.  Even if allowed by the standard it's a quality of 
implementation problem.

One problem is that it's not a new problem, GCC emitted similar code since 
(Continue reading)

Linus Torvalds | 1 Feb 19:52
Gravatar

Re: Memory corruption due to word sharing

On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz <matz <at> suse.de> wrote:
>
> One problem is that it's not a new problem, GCC emitted similar code since
> about forever, and still they turned up only now (well, probably because
> ia64 is dead, but sparc64 should have similar problems).  The bitfield
> handling code is _terribly_ complex and fixing it is quite involved.  So
> don't expect any quick fixes.

I agree that bitfields are nasty, I've had to handle them myself in
sparse. And we have actually had problems with bitfields before, to
the point where we've replaced use of bitfields with masking and
setting bits by hand.

But I also think that gcc is simply *buggy*, and has made them much
nastier than they should be. What gcc *should* have done is to turn
bitfield accesses into shift-and-masking of the underlying field as
early as possible, and then do all optimizations at that level.

In fact, there is another gcc bug outstanding (48696) where I complain
about absolutely horrible code generation, and that one was actually
the exact same issue except in reverse: gcc wouldn't take the
underlying size of the bitfield into account, and use the wrong
(smaller) size for the access, causing absolutely horrendous code
generation that mixes byte and word accesses, and causes slowdowns by
orders of magnitude.

And it really is the same issue: gcc has forgotten what the base type
is, and tries to "compute" some type using the actual bits. Which is
simply *wrong*. Always has been.

(Continue reading)

Linus Torvalds | 1 Feb 19:57
Gravatar

Re: Memory corruption due to word sharing

On Wed, Feb 1, 2012 at 10:09 AM, David Miller <davem <at> davemloft.net> wrote:
>
> Personally I've avoided C bitfields like the plague in any code I've
> written.

I do agree with that. The kernel largely tries to avoid bitfields,
usually because we have some really strict rules about different
bitfields, but also because initialization of bitfields tends to
result in gcc generating an incredible mess of the code (while
"explicit bits" allows us to just set all the fields in one go, and
know what the result is).

So core kernel data structures tend to be things like "unsigned long",
together with our various bitop functions that have explicit atomicity
(or non-atomicity) guarantees on a bit-per-bit basis.

Sometimes bitfields are really convenient, though, and allow for much
more natural syntax.

I'm not surprised this issue came up in a filesystem, for example.

                          Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane