Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
Michal Ostrowski <mostrows <at> gmail.com>
2009-10-19 13:19:23 GMT
The entire scheme for managing net namespaces seems unsafe. We depend
on synchronization via pn->hash_lock, but have no guarantee of the
existence of the "net" object -- hence no way to ensure the existence
of the lock itself. This should be relatively easy to fix though as
we should be able to get/put the net namespace as we add remove
objects to/from the pppoe hash.
Once you solve this existence issue, the flush_lock can be eliminated
altogether since all of the relevant code paths already depend on a
write_lock_bh(&pn->hash_lock), and that's the lock that should be use
to protect the pppoe_dev field.
Another patch to follow later...
--
Michal Ostrowski
mostrows <at> gmail.com
On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet <at> gmail.com> wrote:
> Michal Ostrowski a écrit :
>> Here's my theory on this after an inital look...
>>
>> Looking at the oops report and disassembly of the actual module binary
>> that caused the oops, one can deduce that:
>>
>> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct
>> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
>> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
>>
>> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
(Continue reading)