Martin Schwidefsky | 1 Jun 2004 14:04
Picon
Favicon

Re: [PATCH] ppc64: Fix possible race with set_pte on a present PTE


> The last issue is ptep_establish, we're flushing the pte in do_wp_page
> inside ptep_establish again for no good reason. Those suprious tlb
> flushes may even trigger IPIs (this time in x86 smp too even with
> processes), so I'd really like to remove the explicit flush in
> do_wp_page, however this will likely break s390 but I don't understand
> s390 so I'll leave it broken for now (at least to show you this
> alternative and to hear comments if it's as broken as the previous one).

No, this shouldn't break s390 in any way, removing superfluous tlb flushes
will benefit s390 just like any other architecture.

> The really scary thing about this patch is the s390 ptep_establish.

The s390 version of ptep_establish isn't scary at all, it's just an
optimization. s390 can use the generic set_pte & flush_tlb_page sequence
for ptep_establish without a problem but there is a better way to do it.
We use the ipte instruction because it only flushes the tlb entries for
a single page and not all of them. Don't worry too much about breaking
s390, if you do I will complain.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky <at> de.ibm.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
(Continue reading)

Martin Schwidefsky | 1 Jun 2004 14:10
Picon
Favicon

Re: [PATCH] ppc64: Fix possible race with set_pte on a present PTE


> I did the ppc64 impl, the x86 one (hope I got it right). I still need to
> do ppc32 and I suppose s390 must be fixed now that ptep_estabish is gone
> but I'll leave that to someone who understand something about these
things ;)

At the moment I can't access linux.bkbits.net so I can't test anything but
as far as I can tell s390 should just work as is. ptep_establish is gone
but it has been replaced by correct sequences: ptep_clear_flush & set_pte
and set_pte & flush_tlb_page. The second sequence can be optimized to a
ptep_clear_flush & set_pte if the _PAGE_RO bit has changed. Apart from
that s390 is perfectly fine with the change.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky <at> de.ibm.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>

Dimitri Sivanich | 1 Jun 2004 23:40
Picon
Favicon

Re: Slab cache reap and CPU availability

> 
> Dimitri Sivanich <sivanich <at> sgi.com> wrote:
> >
> > The IA/64 backtrace with all the cruft removed looks as follows:
> > 
> > 0xa000000100149ac0 reap_timer_fnc+0x100
> > 0xa0000001000f4d70 run_timer_softirq+0x2d0
> > 0xa0000001000e9440 __do_softirq+0x200
> > 0xa0000001000e94e0 do_softirq+0x80
> > 0xa000000100017f50 ia64_handle_irq+0x190
> > 
> > The system is running mostly AIM7, but I've seen holdoffs > 30 usec with
> > virtually no load on the system.
> 
> They're pretty low latencies you're talking about there.
> 
> You should be able to reduce the amount of work in that timer handler by
> limiting the size of the per-cpu caches in the slab allocator.  You can do
> that by writing a magic incantation to /proc/slabinfo or:
> 
> --- 25/mm/slab.c~a	Mon May 24 14:51:32 2004
> +++ 25-akpm/mm/slab.c	Mon May 24 14:51:37 2004
>  <at>  <at>  -2642,6 +2642,7  <at>  <at>  static void enable_cpucache (kmem_cache_
>  	if (limit > 32)
>  		limit = 32;
>  #endif
> +	limit = 8;

I tried several values for this limit, but these had little effect.
--
(Continue reading)

Ron Maeder | 5 Jun 2004 21:21

Re: mmap() > phys mem problem

Thanks very much for your response.  I have had some help trying out the 
patch and running recent versions of the kernel.  The problem is not fixed 
in 2.6.6+patch or in 2.6.7-rc2.  Any other suggestions?

Ron

On Sun, 30 May 2004, Nick Piggin wrote:

> Ron Maeder wrote:
>> On Fri, 28 May 2004, Rik van Riel wrote:
>> 
>>> On Tue, 25 May 2004, Ron Maeder wrote:
>>> 
>>>> Is this an "undocumented feature" or is this a linux error?  I would
>>>> expect pages of the mmap()'d file would get paged back to the original
>>>> file. I know this won't be fast, but the performance is not an issue 
>>>> for
>>>> this application.
>>> 
>>> 
>>> It looks like a kernel bug.  Can you reproduce this problem
>>> with the latest 2.6 kernel or is it still there ?
>>> 
>>> Rik
>> 
>> 
>> I was able to reproduce the problem with the code that I posted on a 2.6.6
>> kernel.
>> 
>
(Continue reading)

Nick Piggin | 6 Jun 2004 03:55
Picon

Re: mmap() > phys mem problem

Ron Maeder wrote:
> Thanks very much for your response.  I have had some help trying out the 
> patch and running recent versions of the kernel.  The problem is not 
> fixed in 2.6.6+patch or in 2.6.7-rc2.  Any other suggestions?
> 

OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
waiting for some network IO. Unfortunately at this point, the system
is so clogged up that order 0 GFP_ATOMIC allocations are failing in
this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.

Sadly this seems to happen pretty easily here. I don't know the
network layer, so I don't know what might be required to fix it or if
it is even possible.

This doesn't happen so easily with swap enabled (still theoretically
possible), because freeing block device backed memory should be
deadlock free, so you have another avenue to free memory. I assume
you want diskless clients, so this isn't an option.

You could try working around it by upping /proc/sys/vm/min_free_kbytes
maybe to 2048 or 4096.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>

(Continue reading)

Rik van Riel | 7 Jun 2004 01:51
Picon
Favicon

Re: mmap() > phys mem problem

On Sun, 6 Jun 2004, Nick Piggin wrote:

> OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
> waiting for some network IO. Unfortunately at this point, the system
> is so clogged up that order 0 GFP_ATOMIC allocations are failing in
> this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.

I wonder if there simply isn't enough memory available for
GFP_ATOMIC network allocations, or if a mempool would alleviate
the situation here.

> Sadly this seems to happen pretty easily here. I don't know the
> network layer, so I don't know what might be required to fix it or if
> it is even possible.

The theoretically perfect fix is to have a little mempool for
every critical socket.  That is, every NFS mount, e/g/nbd block
device, etc...

Of course, chances are that having one mempool for the network
allocations might already do the trick for 95% 

--

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
(Continue reading)

Nick Piggin | 7 Jun 2004 05:59
Picon

Re: mmap() > phys mem problem

Rik van Riel wrote:
> On Sun, 6 Jun 2004, Nick Piggin wrote:
> 
> 
>>OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
>>waiting for some network IO. Unfortunately at this point, the system
>>is so clogged up that order 0 GFP_ATOMIC allocations are failing in
>>this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.
> 
> 
> I wonder if there simply isn't enough memory available for
> GFP_ATOMIC network allocations, or if a mempool would alleviate
> the situation here.
> 

Well, no there isn't enough memory available: order 0 allocations
keep failing in the RX path (I assume each time the server retransmits)
and the machine is absolutely deadlocked.

> 
>>Sadly this seems to happen pretty easily here. I don't know the
>>network layer, so I don't know what might be required to fix it or if
>>it is even possible.
> 
> 
> The theoretically perfect fix is to have a little mempool for
> every critical socket.  That is, every NFS mount, e/g/nbd block
> device, etc...
> 

(Continue reading)

Nirendra Awasthi | 7 Jun 2004 12:21

Determining if process is having core dump

Hi,
	Is there a way for a unrelated process to determine if another process 
is exiting and is in the state of having core dump.

		In solaris, this can be determined using libkvm(checking process flags 
for SDOCORE and COREDUMP). Is there a way to do this in linux 2.6

	One of the things I observed is flag in /proc/≤pid>/stat (9th 
attribute) is set to non-zero after process receives a signal to quit 
after core dump (SIGABRT, SIGQUIT etc.). Is it an indication that 
process is going to exit or what does it indicates.
	
	Is there some other way to determine this. I don't want to limit size 
of core file to 0 using ulimit, as this file is required to be analyzed 
later.
	Also, while process is exiting and it receives another signal, it is 
corrupting the core dump.

-Nirendra

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>

Rik van Riel | 7 Jun 2004 14:04
Picon
Favicon

Re: mmap() > phys mem problem

On Mon, 7 Jun 2004, Nick Piggin wrote:

> Well, no there isn't enough memory available: order 0 allocations
> keep failing in the RX path (I assume each time the server retransmits)
> and the machine is absolutely deadlocked.

Yes, but did the memory get exhausted by the RX path itself,
or by something else that's allocating the last system memory?

If the memory exhaustion is because of something else, a
mempool for the RX path might alleviate the situation.

> > The theoretically perfect fix is to have a little mempool for
> > every critical socket.  That is, every NFS mount, e/g/nbd block
> > device, etc...

> It would be cool if someone were able to come up with a formula
> to capture that, and allow sockets to be marked as MEMALLOC to
> enable mempool allocation.

A per-socket mempool I guess.  At creation of a MEMALLOC
socket you'd set up the mempool, and the same mempool
would get destroyed when the socket is closed.

Then all memory allocations for that socket go via the
mempool.

--

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
(Continue reading)

Nirendra Awasthi | 7 Jun 2004 14:22

Re: Determining if process is having core dump

Another problem I noticed is, while sending SIBABRT (or any signal which 
causes core dump) to process already having core dump results in 
corrupting core.

Following is the output while analyzing core with gdb:

Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0xffffe411 in ?? ()

-Nirendra

Nirendra Awasthi wrote:

> Hi,
>     Is there a way for a unrelated process to determine if another 
> process is exiting and is in the state of having core dump.
> 
>         In solaris, this can be determined using libkvm(checking process 
> flags for SDOCORE and COREDUMP). Is there a way to do this in linux 2.6
> 
>     One of the things I observed is flag in /proc/≤pid>/stat (9th 
> attribute) is set to non-zero after process receives a signal to quit 
> after core dump (SIGABRT, SIGQUIT etc.). Is it an indication that 
> process is going to exit or what does it indicates.
>     
>     Is there some other way to determine this. I don't want to limit 
> size of core file to 0 using ulimit, as this file is required to be 
(Continue reading)


Gmane