2 Dec 2005 09:46
3 Dec 2005 09:36
4 Dec 2005 17:19
Re: Re: [PATCH 0/7]: Fix for unsafe notifier chain
Alan Cox <alan <at> lxorguk.ukuu.org.uk>
2005-12-04 16:19:57 GMT
2005-12-04 16:19:57 GMT
On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote: > >Or just don't unregister. That is what I did for the debug notifiers. > > Unregister is not the only problem. Chain traversal races with > register as well. There are some NMI handler registration functions and attempts at safe code for it in the unmerged experimental part of the bluesmoke (bluesmoke.sf.net) project that may be useful perhaps ? ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
7 Dec 2005 00:38
Re: Re: [PATCH 0/7]: Fix for unsafe notifier chain
Keith Owens <kaos <at> sgi.com>
2005-12-06 23:38:44 GMT
2005-12-06 23:38:44 GMT
On Sun, 04 Dec 2005 16:19:57 +0000, Alan Cox <alan <at> lxorguk.ukuu.org.uk> wrote: >On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote: >> >Or just don't unregister. That is what I did for the debug notifiers. >> >> Unregister is not the only problem. Chain traversal races with >> register as well. > >There are some NMI handler registration functions and attempts at safe >code for it in the unmerged experimental part of the bluesmoke >(bluesmoke.sf.net) project that may be useful perhaps ? Thanks Alan, the bluesmoke NMI handlers look very similar to the code that I have just written. However bluesmoke only handles a single notifier chain, it has only one walking_handler_list array. The kernel is getting to the stage where it needs multiple notifier chains that can be traversed without locks. The patch below against 2.6.15-rc5 gives us lockfree traversal of notifier chains and supports multiple chains. The thing that I like about this approach is that the rest of the kernel is barely affected. We only have to change the function calls (adding suffix '_lockfree' and removing any locks in the callers) for code that needs lockfree traversal. Other notifier chains are left alone and there is no need to embed the type of chain in struct notifier_block. Even the change to add '_lockfree' can be incremental, converting chains as required. Note: This patch has been compiled but not tested yet. Included for review and discussion while I debug it.(Continue reading)
7 Dec 2005 03:43
Re: Re: [PATCH 0/7]: Fix for unsafe notifier chain
Keith Owens <kaos <at> sgi.com>
2005-12-07 02:43:51 GMT
2005-12-07 02:43:51 GMT
On Wed, 07 Dec 2005 10:38:44 +1100, Keith Owens <kaos <at> sgi.com> wrote: >On Sun, 04 Dec 2005 16:19:57 +0000, >Alan Cox <alan <at> lxorguk.ukuu.org.uk> wrote: >>On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote: >>> >Or just don't unregister. That is what I did for the debug notifiers. >>> >>> Unregister is not the only problem. Chain traversal races with >>> register as well. >> >>There are some NMI handler registration functions and attempts at safe >>code for it in the unmerged experimental part of the bluesmoke >>(bluesmoke.sf.net) project that may be useful perhaps ? > >Thanks Alan, the bluesmoke NMI handlers look very similar to the code >that I have just written. However bluesmoke only handles a single >notifier chain, it has only one walking_handler_list array. The kernel >is getting to the stage where it needs multiple notifier chains that >can be traversed without locks. The patch below against 2.6.15-rc5 >gives us lockfree traversal of notifier chains and supports multiple >chains. My previous patch was way too complicated, this is much simpler. Based on Corey Minyard's patch of http://lkml.org/lkml/2004/8/19/140, generalized to support multiple lockfree notifier chains, with a few extra synchronization calls added. Again, for review only. Compiled but not tested yet. include/linux/notifier.h | 7 +++(Continue reading)
7 Dec 2005 23:08
[RFC][Patch 0/5] Per-task delay accounting
Shailabh Nagar <nagar <at> watson.ibm.com>
2005-12-07 22:08:05 GMT
2005-12-07 22:08:05 GMT
The following patches add accounting for the delays seen by a task in a) waiting for a CPU (while being runnable) b) completion of synchronous block I/O initiated by the task c) swapping in pages (i.e. capacity misses). Such delays provide feedback for a task's cpu priority, io priority and rss limit values. Long delays, especially relative to other tasks, can be a trigger for changing a task's cpu/io priorities and modifying its rss usage (either directly through sys_getprlimit() that was proposed earlier on lkml or by throttling cpu consumption or process calling sys_setrlimit etc.) There are quite a few differences from the earlier posting of these patches (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0511.1/2275.html): - block I/O is (hopefully) being accounted properly now instead of just counting the time spent in io_schedule() as done earlier. - instead of accounting for time spent in all page faults, only swapping in of pages is being counted since thats the only part that one can really control (capacity misses vs. compulsory misses) - a /proc interface is being used instead of connector-based interface. Andrew Morton suggested a generic connector-based interface useful for future usage of connectors fo stats. This revised connector-based interface will be posted separately since its useful for efficient delivery of any per-task statistics, not just the ones being introduced by these patches. - the timestamping code has been made generic (following the suggestions to Matt Helsley's patches to add timestamps to process events connectors)(Continue reading)
7 Dec 2005 23:13
[RFC][Patch 1/5] nanosecond timestamps and diffs
Shailabh Nagar <nagar <at> watson.ibm.com>
2005-12-07 22:13:01 GMT
2005-12-07 22:13:01 GMT
Add kernel utility functions for - nanosecond resolution timestamps, adjusted for lost ticks - interval (diff) between two such timestamps, in nanoseconds, adjusting for overflow The timestamp part of this patch is identical to the one proposed by Matt Helsley (as part of adding timestamps to process event connectors) http://www.uwsg.indiana.edu/hypermail/linux/kernel/0512.0/1373.html Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com> include/linux/time.h | 16 ++++++++++++++++ kernel/time.c | 22 ++++++++++++++++++++++ 2 files changed, 38 insertions(+) Index: linux-2.6.15-rc5/include/linux/time.h =================================================================== --- linux-2.6.15-rc5.orig/include/linux/time.h +++ linux-2.6.15-rc5/include/linux/time.h <at> <at> -95,6 +95,7 <at> <at> struct itimerval; extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue); extern int do_getitimer(int which, struct itimerval *value); extern void getnstimeofday (struct timespec *tv); +extern void getnstimestamp(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); <at> <at> -113,6 +114,21 <at> <at> set_normalized_timespec (struct timespec ts->tv_nsec = nsec; }(Continue reading)
7 Dec 2005 23:15
[RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off
Shailabh Nagar <nagar <at> watson.ibm.com>
2005-12-07 22:15:52 GMT
2005-12-07 22:15:52 GMT
Changes since 11/14/05 - use nanosecond resolution, adjusted wall clock time for timestamps instead of sched_clock (akpm, andi, marcelo) - kernel param, sysctl option to control delay stats collection (parag) - better CONFIG parameter name (parag) 11/14/05: First post delayacct-init.patch Initialization code related to collection of per-task "delay" statistics which measure how long it had to wait for cpu, sync block io, swapping etc.. The collection of statistics and the interface are in other patches. This patch sets up the data structures and enables the statistics collection to be dynamically enabled (through a kernel boot paramater and through /proc/sys/kernel/delayacct). Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com> Documentation/kernel-parameters.txt | 2 ++ include/linux/delayacct.h | 26 ++++++++++++++++++++++++++ include/linux/sched.h | 11 +++++++++++ include/linux/sysctl.h | 1 + init/Kconfig | 13 +++++++++++++ kernel/Makefile | 1 + kernel/delayacct.c | 36 ++++++++++++++++++++++++++++++++++++ kernel/fork.c | 2 ++ kernel/sysctl.c | 14 ++++++++++++++(Continue reading)
7 Dec 2005 23:23
[RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays
Shailabh Nagar <nagar <at> watson.ibm.com>
2005-12-07 22:23:10 GMT
2005-12-07 22:23:10 GMT
This patch attempts to record all the time spent by a task waiting for completion of (user-initiated) block I/O. Ideally, it would have been nice to be able to record the time spent by a task waiting for I/O events that are related to async block I/O. While that can be done now (by measuring time spent in wait_for_async_kiocb) once (if ?) network aio is implemented, AFAIK, it won't be possible to distinguish async block and network aio events (and I suspect async I/O to pipes too...) so async block I/O gets ignored for now. Suggestions on how async block I/O wait can be accounted accurately would be welcome. Changes since 11/14/05 - use nanosecond resolution, adjusted wall clock time for timestamps instead of sched_clock (akpm, andi, marcelo) - collect stats only if delay accounting enabled (parag) - stats collected for delays in all userspace-initiated block I/O including fsync/fdatasync but not counting waits for async block io events. 11/14/05: First post delayacct-blkio.patch Record time spent by a task waiting for completion of userspace initiated synchronous block I/O. This can help determine the right I/O priority for the task. Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com>(Continue reading)
7 Dec 2005 23:28
[RFC][Patch 4/5] Per-task delay accounting: Swap in delays
Shailabh Nagar <nagar <at> watson.ibm.com>
2005-12-07 22:28:27 GMT
2005-12-07 22:28:27 GMT
Changes since 11/14/05 - use nanosecond resolution, adjusted wall clock time for timestamps instead of sched_clock (akpm, andi, marcelo) - collect stats only if delay accounting enabled (parag) - collect delays for only swapin page faults instead of all page faults. 11/14/05: First post delayacct-swapin.patch Record time spent by a task waiting for its pages to be swapped in. This statistic can help in adjusting the rss limits of tasks (process), especially relative to each other, when the system is under memory pressure. Signed-off-by: Shailabh Nagar <nagar <at> watson.ibm.com> include/linux/delayacct.h | 3 +++ include/linux/sched.h | 2 ++ mm/memory.c | 16 +++++++++------- 3 files changed, 14 insertions(+), 7 deletions(-) Index: linux-2.6.15-rc5/include/linux/delayacct.h =================================================================== --- linux-2.6.15-rc5.orig/include/linux/delayacct.h +++ linux-2.6.15-rc5/include/linux/delayacct.h <at> <at> -20,11 +20,14 <at> <at> extern int delayacct_on; /* Delay accounting turned on/off */ extern void delayacct_tsk_init(struct task_struct *tsk);(Continue reading)
RSS Feed