tenshinopocket309 | 2 Dec 09:46 2005
Picon

•Ï‚¦‚Ü‚µ‚½H

gуAhς܂HȂ񂩃G[Ŗ߂ėAp\R[ɂĂ݂܂cCȂ́HNHY޵隊X'u[
ަk!W~鮆zkC	塧m <at> ^ǚ^zZfzj!x2ɫ,a{,H4mi(ܢov' <at> jYhr' <at> ׯ: <at> rX
miffy_love55 | 3 Dec 09:36 2005
Picon

•Ï‚¦‚Ü‚µ‚½H

gуAhς܂HȂ񂩃G[Ŗ߂ėAp\R[ɂĂ݂܂cCȂ́HNHY޵隊X'u[
ަk!W~鮆zkC	塧m <at> ^ǚ^zZfzj!x2ɫ,a{,H4mi(ܢov' <at> jYhr' <at> ׯ: <at> rX
Alan Cox | 4 Dec 17:19 2005
Picon

Re: Re: [PATCH 0/7]: Fix for unsafe notifier chain

On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote:
> >Or just don't unregister. That is what I did for the debug notifiers.
> 
> Unregister is not the only problem.  Chain traversal races with
> register as well.

There are some NMI handler registration functions and attempts at safe
code for it in the unmerged experimental part of the bluesmoke
(bluesmoke.sf.net) project that may be useful perhaps ?

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
Keith Owens | 7 Dec 00:38 2005
Picon

Re: Re: [PATCH 0/7]: Fix for unsafe notifier chain

On Sun, 04 Dec 2005 16:19:57 +0000, 
Alan Cox <alan <at> lxorguk.ukuu.org.uk> wrote:
>On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote:
>> >Or just don't unregister. That is what I did for the debug notifiers.
>> 
>> Unregister is not the only problem.  Chain traversal races with
>> register as well.
>
>There are some NMI handler registration functions and attempts at safe
>code for it in the unmerged experimental part of the bluesmoke
>(bluesmoke.sf.net) project that may be useful perhaps ?

Thanks Alan, the bluesmoke NMI handlers look very similar to the code
that I have just written.  However bluesmoke only handles a single
notifier chain, it has only one walking_handler_list array.  The kernel
is getting to the stage where it needs multiple notifier chains that
can be traversed without locks.  The patch below against 2.6.15-rc5
gives us lockfree traversal of notifier chains and supports multiple
chains.

The thing that I like about this approach is that the rest of the
kernel is barely affected.  We only have to change the function calls
(adding suffix '_lockfree' and removing any locks in the callers) for
code that needs lockfree traversal.  Other notifier chains are left
alone and there is no need to embed the type of chain in struct
notifier_block.  Even the change to add '_lockfree' can be incremental,
converting chains as required.

Note: This patch has been compiled but not tested yet.  Included for
review and discussion while I debug it.
(Continue reading)

Keith Owens | 7 Dec 03:43 2005
Picon

Re: Re: [PATCH 0/7]: Fix for unsafe notifier chain

On Wed, 07 Dec 2005 10:38:44 +1100, 
Keith Owens <kaos <at> sgi.com> wrote:
>On Sun, 04 Dec 2005 16:19:57 +0000, 
>Alan Cox <alan <at> lxorguk.ukuu.org.uk> wrote:
>>On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote:
>>> >Or just don't unregister. That is what I did for the debug notifiers.
>>> 
>>> Unregister is not the only problem.  Chain traversal races with
>>> register as well.
>>
>>There are some NMI handler registration functions and attempts at safe
>>code for it in the unmerged experimental part of the bluesmoke
>>(bluesmoke.sf.net) project that may be useful perhaps ?
>
>Thanks Alan, the bluesmoke NMI handlers look very similar to the code
>that I have just written.  However bluesmoke only handles a single
>notifier chain, it has only one walking_handler_list array.  The kernel
>is getting to the stage where it needs multiple notifier chains that
>can be traversed without locks.  The patch below against 2.6.15-rc5
>gives us lockfree traversal of notifier chains and supports multiple
>chains.

My previous patch was way too complicated, this is much simpler.  Based
on Corey Minyard's patch of http://lkml.org/lkml/2004/8/19/140,
generalized to support multiple lockfree notifier chains, with a few
extra synchronization calls added.

Again, for review only.  Compiled but not tested yet.

 include/linux/notifier.h |    7 +++
(Continue reading)

Shailabh Nagar | 7 Dec 23:08 2005
Picon

[RFC][Patch 0/5] Per-task delay accounting

The following patches add accounting for the delays seen by a task in
a) waiting for a CPU (while being runnable)
b) completion of synchronous block I/O initiated by the task
c) swapping in pages (i.e. capacity misses).

Such delays provide feedback for a task's cpu priority, io priority and
rss limit values. Long delays, especially relative to other tasks, can be
a trigger for changing a task's cpu/io priorities and modifying its rss usage
(either directly through sys_getprlimit() that was proposed earlier on lkml or
by throttling cpu consumption or process calling sys_setrlimit etc.)

There are quite a few differences from the earlier posting of these patches
(http://www.uwsg.indiana.edu/hypermail/linux/kernel/0511.1/2275.html):

- block I/O is (hopefully) being accounted properly now  instead of just counting the
time spent in io_schedule() as done earlier.

- instead of accounting for time spent in all page faults, only swapping in of pages
is being counted since thats the only part that one can really control (capacity misses
vs. compulsory misses)

- a /proc interface is being used instead of connector-based interface. Andrew Morton
suggested a generic connector-based interface useful for future usage of
connectors fo stats. This revised connector-based interface will be posted separately
since its useful for efficient delivery of any per-task statistics, not just the ones
being introduced by these patches.

- the timestamping code has been made generic (following the suggestions to Matt Helsley's
patches to add timestamps to process events connectors)

(Continue reading)

Shailabh Nagar | 7 Dec 23:13 2005
Picon

[RFC][Patch 1/5] nanosecond timestamps and diffs

Add kernel utility functions for
- nanosecond resolution timestamps, adjusted for lost ticks
- interval (diff) between two such timestamps, in nanoseconds, adjusting
  for overflow

The timestamp part of this patch is identical to the one proposed by
Matt Helsley (as part of adding timestamps to process event connectors)
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0512.0/1373.html

Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com>

 include/linux/time.h |   16 ++++++++++++++++
 kernel/time.c        |   22 ++++++++++++++++++++++
 2 files changed, 38 insertions(+)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
 <at>  <at>  -95,6 +95,7  <at>  <at>  struct itimerval;
 extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue);
 extern int do_getitimer(int which, struct itimerval *value);
 extern void getnstimeofday (struct timespec *tv);
+extern void getnstimestamp(struct timespec *ts);

 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);

 <at>  <at>  -113,6 +114,21  <at>  <at>  set_normalized_timespec (struct timespec
 	ts->tv_nsec = nsec;
 }
(Continue reading)

Shailabh Nagar | 7 Dec 23:15 2005
Picon

[RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
  instead of sched_clock (akpm, andi, marcelo)
- kernel param, sysctl option to control delay stats collection (parag)
- better CONFIG parameter name (parag)

11/14/05: First post

delayacct-init.patch

Initialization code related to collection of per-task "delay"
statistics which measure how long it had to wait for cpu,
sync block io, swapping etc.. The collection of statistics and
the interface are in other patches. This patch sets up the data
structures and enables the statistics collection to be dynamically
enabled (through a  kernel boot paramater and through
/proc/sys/kernel/delayacct).

Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com>

 Documentation/kernel-parameters.txt |    2 ++
 include/linux/delayacct.h           |   26 ++++++++++++++++++++++++++
 include/linux/sched.h               |   11 +++++++++++
 include/linux/sysctl.h              |    1 +
 init/Kconfig                        |   13 +++++++++++++
 kernel/Makefile                     |    1 +
 kernel/delayacct.c                  |   36 ++++++++++++++++++++++++++++++++++++
 kernel/fork.c                       |    2 ++
 kernel/sysctl.c                     |   14 ++++++++++++++
(Continue reading)

Shailabh Nagar | 7 Dec 23:23 2005
Picon

[RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays

This patch attempts to record all the time spent by a task
waiting for completion of (user-initiated) block I/O. Ideally, it
would have been nice to be able to record the time spent by a task
waiting for I/O events that are related to async block I/O. While
that can be done now (by measuring time spent in wait_for_async_kiocb)
once (if ?) network aio is implemented, AFAIK, it won't be possible
to distinguish async block and network aio events (and I suspect async
I/O to pipes too...) so async block I/O gets ignored for now.

Suggestions on how async block I/O wait can be accounted accurately would
be welcome.

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
  instead of sched_clock (akpm, andi, marcelo)
- collect stats only if delay accounting enabled (parag)
- stats collected for delays in all userspace-initiated block I/O
including fsync/fdatasync but not counting waits for async block io events.

11/14/05: First post

delayacct-blkio.patch

Record time spent by a task waiting for completion of
userspace initiated synchronous block I/O. This can help
determine the right I/O priority for the task.

Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com>

(Continue reading)

Shailabh Nagar | 7 Dec 23:28 2005
Picon

[RFC][Patch 4/5] Per-task delay accounting: Swap in delays

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
  instead of sched_clock (akpm, andi, marcelo)
- collect stats only if delay accounting enabled (parag)
- collect delays for only swapin page faults instead of all page faults.

11/14/05: First post

delayacct-swapin.patch

Record time spent by a task waiting for its pages to be swapped in.
This statistic can help in adjusting the rss limits of
tasks (process), especially relative to each other, when the system is
under memory pressure.

Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com>

 include/linux/delayacct.h |    3 +++
 include/linux/sched.h     |    2 ++
 mm/memory.c               |   16 +++++++++-------
 3 files changed, 14 insertions(+), 7 deletions(-)

Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
 <at>  <at>  -20,11 +20,14  <at>  <at> 
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern void delayacct_tsk_init(struct task_struct *tsk);
(Continue reading)


Gmane