Jesse Barnes | 1 Mar 2002 02:34
Picon
Favicon

Re: NUMA scheduling

On Thu, Feb 28, 2002 at 06:41:30PM +0100, Erich Focht wrote:
> This is what I'm getting on a 16 CPU AzusA (Itanium) with Ingo's
> O(1) scheduler based on the version in 2.5.6-pre1. The kernel is 2.4.17,
> otherwise. The results are averages of three measurements. NUMA_EF is my
> version of the node-affinity extensions without memory affinity, i.e. the
> memory is allocated more or less randomly.
> 
> hackbench   Ingo    NUMA_EF
> ---------   -----   -------
> 30          3.058   3.181
> 40          4.599   4.354
> 50          6.412   5.976
> 
> I find it strange that the results you are showing for the Ingo scheduler
> scale so poorly, maybe there is still some basic problem there?

Possibly--TLB flushes are outrageously expensive on our test platform.
Anyway, I ran with your latest patch and came up with this on a 16p
IA64 machine.  The system was somewhat unstable however--upon starting
a hackbench process, the system would sometimes hang.

#	NUMA_EF		2.4.17 stock
--	-------		------------
30	4.336		24.026
40	5.800		35.468
50	6.716		47.090

I added the CPU_TO_NODE and SAPICID_TO_NODE macros (easy enough), but
when are you planning to get rid of NR_NODES.  Is it possible to just
use numnodes to kmalloc the arrays at init time?
(Continue reading)

Hubertus Franke | 1 Mar 2002 03:00
Picon
Favicon

Re: [PATCH] Lightweight userspace semaphores...

On Thu, Feb 28, 2002 at 04:24:22PM -0800, Richard Henderson wrote:
> On Wed, Feb 27, 2002 at 10:53:11AM -0500, Hubertus Franke wrote:
> > As stated above, I allocate a kernel object <kulock_t> on demand and
> > hash it. This way I don't have to pin any user address. What does everybody
> > think about the merit of this approach versus the pinning approach?
> [...]
> > In your case, can the lock be allocated at different
> > virtual addresses in the various address spaces.
> 
> I think this is a relatively important feature.  It may not be
> possible to use the same virtual address in different processes.
> 
> 
> r~

I think so too. However let me point that Linus's initial recommendation
of a handle, comprised of a kernel pointer and a signature also has
that property.
Just pointing out the merits of the various approaches.

-- Hubertus

_______________________________________________
Lse-tech mailing list
Lse-tech <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lse-tech

Ravikiran G Thirumalai | 1 Mar 2002 15:43
Picon

[Patch] Performance improvements with scalable statistics counters

Hello,
We did some measurements by modifying the loopback driver to use scalable
statistics counters, and the results follow...

Recall that this approach uses per-cpu versions for counters instead of a 
shared global location; reducing counter cacheline bouncing thus 
improving SMP performance. Particularly useful for frequently updated
statistics counters used in the kernel. The implementation uses a per-cpu 
dynamic word allocator.  For the UP case, the interfaces wrap onto the 
usual global shared counter.

Results:
-------

1. Running tbench with 40 clients on loopback interface on 8 way smp.
   (Throughput in MB/sec averaged for 10 runs)

Cpus	Vanilla kernel		Statctr mods to lo	% improvement
----	-------------		------------------	-------------
 8	72.69243		78.45725		7.93
 7	71.88495		73.35581		2.04
 6	77.65202		78.95159		1.67
 5	83.02133		90.67332		9.21
 4	83.77254		84.41258		0.76
 3	73.32989		74.73163		1.91
 2	56.27168		57.20956		1.67

2. Kernprof PC sample profile count in loopback_xmit (This is the lo routine
   which is modified to use scalable statistics counters). This was for 4 runs
   of tbench 40 on the loopback interface 
(Continue reading)

Erich Focht | 1 Mar 2002 19:04
Picon

Re: NUMA scheduling

On Thu, 28 Feb 2002, Jesse Barnes wrote:

> Possibly--TLB flushes are outrageously expensive on our test platform.
> Anyway, I ran with your latest patch and came up with this on a 16p
> IA64 machine.  The system was somewhat unstable however--upon starting
> a hackbench process, the system would sometimes hang.
> 
> #	NUMA_EF		2.4.17 stock
> --	-------		------------
> 30	4.336		24.026
> 40	5.800		35.468
> 50	6.716		47.090

OK, that looks reasonable. Did you change IDLE_REBALANCE_TICK back to
1ms? I didn't see any crashes after running 50x hackbench 50...

> I added the CPU_TO_NODE and SAPICID_TO_NODE macros (easy enough), but
> when are you planning to get rid of NR_NODES.  Is it possible to just
> use numnodes to kmalloc the arrays at init time?

I'll change that, it's annoying, you're right. But we don't have numnodes
without NUMA & DISCONTIG, I guess? Anyhow, I can count the nodes if
SAPICID_TO_NODE() works.

I forgot to mention: I changed the initial balancing from do_fork() to
do_exec(), but there is a small mistake in there... The current process
should not be counted when deciding where to move it to.

Best regards,
Erich
(Continue reading)

Jesse Barnes | 1 Mar 2002 19:10
Picon
Favicon

Re: NUMA scheduling

On Fri, Mar 01, 2002 at 07:04:06PM +0100, Erich Focht wrote:
> OK, that looks reasonable. Did you change IDLE_REBALANCE_TICK back to
> 1ms? I didn't see any crashes after running 50x hackbench 50...

No, I left it at 50.  I'll try again with 1ms though, and maybe 10ms.

> I'll change that, it's annoying, you're right. But we don't have numnodes
> without NUMA & DISCONTIG, I guess? Anyhow, I can count the nodes if
> SAPICID_TO_NODE() works.

numnodes should be defined for all architectures & configs if I'm
reading mm/numa.c and mm/Makefile right, so it looks safe to use.

> I forgot to mention: I changed the initial balancing from do_fork() to
> do_exec(), but there is a small mistake in there... The current process
> should not be counted when deciding where to move it to.

I'll take a look at that too.

Jesse

_______________________________________________
Lse-tech mailing list
Lse-tech <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lse-tech

Jesse Barnes | 2 Mar 2002 00:06
Picon
Favicon

Re: [Linux-ia64] O(1) scheduler K3+ for IA64

Hey Erich, I've been testing out your latest K3+ patch (along with
yours and Mike's NUMA scheduler changes) and found that it seems less
stable than the old version that used locking for the tlb flush stuff.
I think there's a deadlock somewhere in the new code since
2.4.17 + kdb + ia64 + Ingo K3 + old K3+: rock solid
2.4.17 + kdb + ia64 + Ingo K3 + new K3+: sometimes hangs at boot,
  sometimes after a few hackbench processes have run

I'm in the process of trying to figure out exactly why the hangs
happen, but I thought I'd let you know since you might be able to find
out right away.

Thanks,
Jesse

On Thu, Feb 28, 2002 at 07:44:42PM +0100, Erich Focht wrote:
> Hi,
> 
> the latest scheduler from Ingo included in 2.5.6-pre1 includes
> set_cpus_allowed() function working for all processes. Here is a port to
> IA64, kernel 2.4.17. Please apply: 
>   - kdb-v2.1-2.4.17-common-2
>   - linux-2.4.17-ia64-011226.diff
>   - kdb-v2.1-2.4.17-ia64-011226-1
>   - sched-O1-2.4.17-K3.patch  from http://people.redhat.com/mingo/O(1)-scheduler/
>   - the appended ia64 port with K3+ changes.
> 
> There is a small bugfix included (disable interrupts in
> migration_task) and I changed the way how the migration tasks were
> distributed across the CPUs. I hope this works for everybody...
(Continue reading)

Paul Jackson | 2 Mar 2002 00:49
Picon

Second version of CpuMemSets patch available on SourceForge

I have uploaded the second version of the CpuMemSets patch
to the LSE project on SourceForge.  It is the CpuMemSets-0.02
Release of the NUMA package.

This is the second release of CpuMemSets.  It contains a
performance improvement for page allocation, a bug fix and
some cleanup.

Special thanks to Sam Ortiz, who is responsible for this update.

CpuMemSets-0.02 changelog
------------------------
   o Page allocation improvement:
     build_zonelist_cms is no longer called. It has been
     replaced by build_mems_allowed, a lighter function 
     that is called less often. The performance overhead
     has been fairly reduced.

   o Bug fix:
     atomic_dec_and_free_cms can now be called. It used to 
     make the system crash. The bug was caused by fork. The
     cms counter was not incremented when duplicating the 
     memory map of the parent process.

   o Various clean-ups:
     - vma->cms is now vma->vm_cms
     - No longer use of CMS_MAX* constants. It has been
       replaced  by the kernel's global definitions
     - atomic_dec_and_free_cms + kmem_cache_free ->
       free_vm_area()
(Continue reading)

Jesse Barnes | 2 Mar 2002 01:22
Picon
Favicon

Re: [Linux-ia64] O(1) scheduler K3+ for IA64

Since I posted this message I've tested out the new K3+ patch quite a
bit more.  I haven't been able to get it to hang if the machine boots
(even if I run hackbench 100 several times), but I often see hangs
(esp. on > 16p systems) at boot right after the message 'Total of 30
processors activated (20899.60 BogoMIPS).'

Any ideas?

Thanks,
Jesse

On Fri, Mar 01, 2002 at 03:06:22PM -0800, Jesse Barnes wrote:
> Hey Erich, I've been testing out your latest K3+ patch (along with
> yours and Mike's NUMA scheduler changes) and found that it seems less
> stable than the old version that used locking for the tlb flush stuff.
> I think there's a deadlock somewhere in the new code since
> 2.4.17 + kdb + ia64 + Ingo K3 + old K3+: rock solid
> 2.4.17 + kdb + ia64 + Ingo K3 + new K3+: sometimes hangs at boot,
>   sometimes after a few hackbench processes have run
> 
> I'm in the process of trying to figure out exactly why the hangs
> happen, but I thought I'd let you know since you might be able to find
> out right away.
> 
> Thanks,
> Jesse
> 
> On Thu, Feb 28, 2002 at 07:44:42PM +0100, Erich Focht wrote:
> > Hi,
> > 
(Continue reading)

Hubertus Franke | 2 Mar 2002 15:08
Picon
Favicon

Re: [PATCH] Lightweight userspace semaphores...

On Fri, Mar 01, 2002 at 09:07:18PM +0100, Martin Wirth wrote:
> 
> 
> Hubertus Franke Worte:
> 
> >
> >So, it works and is correct. Hope that clarifies it, if not let me know.
> >Interestingly enough. This scheme doesn't work for spinning locks.
> >Goto lib/ulocks.c[91-133], I have integrated this dead code here
> >to show how not to do it. A similar analysis as above shows
> >that this approach wouldn't work. You need cmpxchg for these scenarios
> >(more or less).
> >
> 
> You are right, I falsely assumed the initial state to be [1,1].
> 
> But as mentioned in your README, your approach is currently is not able 
> to manage signal handling correctly.
> You have to ignore all non-fatal signals by using ESYSRESTART and a 
> SIG_KILL sent to one of the processes
> may corrupt your user-kernel-syncronisation. 
> 
> I don't think a user space semaphore implementation is acceptable until 
> it provides (signal-) interruptability and
> timeouts. But maybe you have some idea how to manage this.
> 
> Martin Wirth
> 
>  

(Continue reading)

Hubertus Franke | 2 Mar 2002 15:50
Picon
Favicon

Re: [PATCH] Lightweight userspace semaphores...

On Wed, Feb 27, 2002 at 11:24:17AM +1100, Rusty Russell wrote:
> OK.  New implementation.  Compiles (PPC), but due to personal reasons,
> UNTESTED.  Thanks especially to Hubertus for his prior work and
> feedback.
>
> 1) Turned into syscalls.
> 2) Added arch-specific sys_futex_region() to enable mutexes on region,
>    and give a chance to detect lack of support via -ENOSYS.
> 3) Just a single atomic_t variable for mutex (thanks Paul M).
> - This means -ENOSYS on SMP 386s, as we need cmpxchg.

> - We can no longer do generic semaphores: mutexes only.
> - Requires arch __update_count atomic op.

    I don't see that, you can use atomic_inc on 386s ..

> 4) Current valid args to sys_futex are -1 and +1: we can implement
>    other lock types by using other values later (eg. rw locks).
>

  (seem below) 

> Size of futex.c dropped from 244 to 161 lines, so it's clearly 40%
> better!
>
> Get your complaints in now...

  (plenty below :-)

> Rusty.
(Continue reading)


Gmane