Jay Lan | 1 Sep 01:01 2004
Picon

Re: [Lse-tech] Re: [PATCH] new CSA patchset for 2.6.8

Adding csa <at> oss.sgi.com, the CSA user group mailing list, to Cc.

Tim Schmielau wrote:
> On Mon, 30 Aug 2004, Guillaume Thouvenin wrote:
> 
> 
>>  Thus, to be clear, the enhanced accounting can be divided into
>>three parts:
>>
>>    1) A common data collection method in the kernel.
>>       We could start from BSD-accounting and add CSA information. Could
>>       it be something like BSD version4?
> 
> 
> I've had a quick look at the CSA data collection patches. To get the 
> discussion started, here are my comments:
> 
> 
>>--- linux.orig/drivers/block/ll_rw_blk.c        2004-08-13 22:36:16.000000000 -0700
>>+++ linux/drivers/block/ll_rw_blk.c     2004-08-18 12:07:10.000000000 -0700
>> <at>  <at>  -1948,10 +1950,12  <at>  <at> 
>> 
>>        if (rw == READ) {
>>                disk_stat_add(rq->rq_disk, read_sectors, nr_sectors);
>>+               current->rblk += nr_sectors;
>>                if (!new_io)
>>                        disk_stat_inc(rq->rq_disk, read_merges);
>>        } else if (rw == WRITE) {
>>                disk_stat_add(rq->rq_disk, write_sectors, nr_sectors);
>>+               current->wblk += nr_sectors;
(Continue reading)

John Hesterberg | 1 Sep 23:44 2004
Picon

Re: Re: [PATCH] new CSA patchset for 2.6.8

On Tue, Aug 31, 2004 at 11:06:47AM +0200, Guillaume Thouvenin wrote:
> On Fri, Aug 27, 2004 at 12:55:03PM -0700, Jay Lan wrote:
> > Please visit http://oss.sgi.com/projects/pagg/
> > The page has been updated to provide information on a per job
> > accounting project called 'job' based on PAGG.
> > 
> > There is one userspace rpm and one kernel  module for job.
> > This may provide what you are looking for. It is a mature product
> > as well. I am sure Limin(job) and Erik(pagg) would appreciate any
> > input you can provide to make 'job' more useful.
> 
>   I have a question about job. If I understand how it works, you can not
> add a process in a job. I mean when you start a session, a container is 
> created and it's the only way to create it.

Right, that's the current implementation.  Any privileged process can
create a job, though, it doesn't *have* to be at the start of a session.
I believe job is currently hardwired that the initial member process is
the creator, and the only other way in is via inheritance, and there's
no way out of the job other than exiting or creating your own job.

> If I'm right, I think that it could be interesting to add a process
> using ioctl and /proc interface. 

We're planning on changing that interface, but I think your question
applies regardless of what interface is used.

> For example, if I want to know how resources are used by a
> compilation, I need to add the process gcc in a container. Any
> comments? 
(Continue reading)

Guillaume Thouvenin | 3 Sep 15:30 2004
Picon
Picon

Proposal for Enhanced Accounting HOWTO

                       Enhanced Accounting HOWTO
		       =========================

  According to discussion on the lse-tech mailing-list, it appears that
three steps (at least) are required to improve accounting. 

1) Improve accounting structure
   ----------------------------

   The current BSD-accounting structure doesn't have enough informations. 
Metrics computed by CSA module can be added to BSD accounting. According 
to other discussion (like Andi Kleen's comment on the patch I wrote when 
I wanted to add CSA IO values in the BSD accounting 
( http://lkml.org/lkml/2004/8/2/70 ) the current method to get metrics 
about blocks/char read/write is not accurate since most writes can be 
accounted by some pdflush threads. Maybe add a counter in the routine
mpage_writepages() but I don't know if we can recover a process ID from 
the struct page and I don't know if it will be enough... I'm looking if 
this is the right way.

2) Group of processes management
   ------------------------------

   We need to be able to manage groups of processes as it's clear that 
a major accounting improvement is the per-job accounting. I don't know if 
"job" is the right noun. There are several implementation that already exist
and some of them are already in the kernel. The property needed here is that 
if a process is in a container, its children will be in the same container. 
Different implementations can be:

(Continue reading)

Brian Sumner | 3 Sep 16:05 2004
Picon

fast userland access to cpu number

Hello Andi,

After some internal discussion about this at SGI, we thought
it would be good to ask for your input.

We have a variety of reasons for needing very fast access
to current_thread_info()->cpu from userland threads, including
statistical analysis, and libraries which need to perform well
on NUMA platforms regardless of the affinity and policies
(or lack thereof) set by its calling threads.

I had hoped that libnuma might provide this, but what I found
was that numa_preffered() falls through to

        /* could read the current CPU from /proc/self/status. Probably 
           not worth it. */
        return 0; /* or random one? */

in the case that the policy is not MPOL_PREFERRED or MPOL_BIND.

Parsing /proc/self/status has been measured at 10's of usecs,
which is unacceptably long.

I'd like to request your suggestions on how best to provide much
faster access to this information.  Here are some options
I've come up with:

. a new flag to sys_get_mempolicy

. ioctl to a cpu device
(Continue reading)

Jan-Frode Myklebust | 3 Sep 16:12 2004
Picon
Picon

Re: Proposal for Enhanced Accounting HOWTO

On Fri, Sep 03, 2004 at 03:30:48PM +0200, Guillaume Thouvenin wrote:
>                        Enhanced Accounting HOWTO
> 		       =========================
> 
>   Here is an example of what can be done:
> 
>   1) First we can add processes into containers using the GPM (group of 
>      processes manager) module. For example we can add an ftp server with 
>      pid #123 and a daemon ssh with pid #234. Thus, inside the GPM you have:
>          container 1 -> 123
> 	 container 2 -> 234
> 
>   2) Now, a user can login via ssh, so, sshd will create new children. Thus, 
>      inside the GPM you will have something like:
>          container 1 -> 123
> 	 container 2 -> 234 333 334 335

Can/will this support project based accounting ? 

i.e. Users being member of several projects. Each project having
multiple users as members. And the user is allowed to decide which
project-id each container is going to run under / be accounted to ?

  -jf

Martin J. Bligh | 3 Sep 16:37 2004

Re: fast userland access to cpu number

> We have a variety of reasons for needing very fast access
> to current_thread_info()->cpu from userland threads, including
> statistical analysis, and libraries which need to perform well
> on NUMA platforms regardless of the affinity and policies
> (or lack thereof) set by its calling threads.

How can this be any use? It could change the instant after you read it.

M.

-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
Andi Kleen | 3 Sep 16:44 2004
Picon

Re: fast userland access to cpu number

Hallo,

> We have a variety of reasons for needing very fast access
> to current_thread_info()->cpu from userland threads, including
> statistical analysis, and libraries which need to perform well
> on NUMA platforms regardless of the affinity and policies
> (or lack thereof) set by its calling threads.

It does not seem very useful, because if you don't set
explicit scheduling affinity it can change any time and
when you set the affinity you already should know, or 
can just read once and cache. 

> I'd like to request your suggestions on how best to provide much
> faster access to this information.  Here are some options
> I've come up with:
> 
> . a new flag to sys_get_mempolicy
> 
> . ioctl to a cpu device
> 
> . a new fsyscall
> 
> . a reserved word relative to the thread pointer which
>   the kernel would maintain (affecting glibc)

It doesn't work portably because a lot of architectures
cannot provide per CPU mappings.

On IA64 the new fsyscall would look like to be the best
(Continue reading)

Brian Sumner | 3 Sep 16:54 2004
Picon

Re: fast userland access to cpu number

"Martin J. Bligh" wrote:
> 
> How can this be any use? It could change the instant after you read it.

For statistical sampling, that doesn't matter.

For library use, this value changes slowly enough,
especially with a NUMA aware scheduler, that it
is extremely useful.  If the thread grabs a snapshot of the
cpu number and uses it to select an object from a pool known
to be close (in some NUMA distance metric sense) to
that CPU, the chances are _extremely_ high that that
object will be close to the thread when it accesses it.

Brian

-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
Brian Sumner | 3 Sep 17:07 2004
Picon

Re: fast userland access to cpu number

Andi Kleen wrote:
> 
> It does not seem very useful, because if you don't set
> explicit scheduling affinity it can change any time and
> when you set the affinity you already should know, or
> can just read once and cache.

Hopefully, my response to Martin suggested some aspects
of its usefulness.  I don't think caching will work for
a library, unless an interface is added to inform it
when the thread changes the affinity.  That seems
like quite a burden on users of the library.

> On IA64 the new fsyscall would look like to be the best
> option if you really wanted it.

We can certainly provide a patch for this.

> But I guess you first need some good reason why it is needed at all.
> Assuming it was useful for IA64 for some reason this system
> call should be probably added to other architectures too.

Right.

> I don't think it belongs in get_mempolicy()

I didn't really think so either, but it would avoid
adding a syscall...

Brian
(Continue reading)

Martin J. Bligh | 3 Sep 17:38 2004

Re: fast userland access to cpu number

> "Martin J. Bligh" wrote:
>> 
>> How can this be any use? It could change the instant after you read it.
> 
> For statistical sampling, that doesn't matter.

Fair enough, but wouldn't that be easier to do by creating an array (inside
the kernel), and logging which cpus it goes on and off at which time, rather
than polling for it? Migrates should be rare ... copying the currently
maintained sys/user time stuff at migrate would work, I'd think ... if you
really need the stats.

> For library use, this value changes slowly enough,
> especially with a NUMA aware scheduler, that it
> is extremely useful.  If the thread grabs a snapshot of the
> cpu number and uses it to select an object from a pool known
> to be close (in some NUMA distance metric sense) to
> that CPU, the chances are _extremely_ high that that
> object will be close to the thread when it accesses it.

OK ... but why not just ask the kernel for the memory and let it sort it
out? we're atomic whilst in the kernel, and moreover, new allocations
should default to being local anyway.

M.

PS. For mem alloc you'd want the node number, not the cpu number anyway,
which is significantly less likely to change.

-------------------------------------------------------
(Continue reading)


Gmane