Alexey Dobriyan | 1 Mar 2009 02:33
Picon

Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do?

On Fri, Feb 27, 2009 at 01:31:12AM +0300, Alexey Dobriyan wrote:
> This is collecting and start of dumping part of cleaned up OpenVZ C/R
> implementation, FYI.

OK, here is second version which shows what to do with shared objects
(cr_dump_nsproxy(), cr_dump_task_struct()), introduced more checks
(still no unlinked files) and dumps some more information including
structures connections (cr_pos_*)

Dumping pids in under thinking because in OpenVZ pids are saved as
numbers due to CLONE_NEWPID is not allowed in container. In presense
of multiple CLONE_NEWPID levels this must present a big problem. Looks
like there is now way to not dump pids as separate object.

As result, struct cr_image_pid is variable-sized, don't know how this will
play later.

Also, pid refcount check for external pointers is busted right now,
because /proc inode pins struct pid, so there is almost always refcount
vs ->o_count mismatch.

No restore yet. ;-)



 arch/x86/include/asm/unistd_32.h   |    2 
 arch/x86/kernel/syscall_table_32.S |    2 
 include/linux/Kbuild               |    1 
 include/linux/cr.h                 |  169 +++++++++++++
 include/linux/ipc_namespace.h      |    3 
(Continue reading)

Nick Piggin | 1 Mar 2009 03:38
Picon

Re: [patch][rfc] mm: new address space calls

On Sat, Feb 28, 2009 at 06:19:56PM -0500, Christoph Hellwig wrote:
> On Wed, Feb 25, 2009 at 03:59:57PM -0500, Chris Mason wrote:
> > One problem I have with the btrfs extent state code is that I might
> > choose to release the extent state in releasepage, but the VM might not
> > choose to free the page.  So I've got an up to date page without any of
> > the rest of my state.
> > 
> > Which of these ops covers that? ;)  I'd love to help better document the
> > requirements for these callbacks, I find it confusing every time.
> 
> releasepage has also another problem.  It only gets called after
> discard_buffer discarded lots of valuable information from the buffers,
> which gets XFS into really bad trouble as that drops information if
> there is a delalloc extent.

Then I think it just needs to provide its own invalidatepage?

> I'd really like to see some major overhaul in that area, and that also
> extende to documentation (or just naming, why is block_invalidatepage
> calling into a method called ->releasepage, but there also is a
> ->invalidatepage that gets called from truncate*page routines..)

Those convoluted call paths are really bloody annoying.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nick Piggin | 1 Mar 2009 03:45
Picon

Re: [patch][rfc] mm: new address space calls

On Sat, Feb 28, 2009 at 06:24:21PM -0500, Christoph Hellwig wrote:
> On Wed, Feb 25, 2009 at 11:48:39AM +0100, Nick Piggin wrote:
> > This is about the last change to generic code I need for fsblock.
> > Comments?
> > 
> > Introduce new address space operations sync and release, which can be used
> > by a filesystem to synchronize and release per-address_space private metadata.
> > They generalise sync_mapping_buffers, invalidate_inode_buffers, and
> > remove_inode_buffers calls, and get another step closer to divorcing
> > buffer heads from core mm/fs code.
> 
> >  void invalidate_inode_buffers(struct inode *inode)
> >  {
> > -	if (inode_has_buffers(inode)) {
> > -		struct address_space *mapping = &inode->i_data;
> > +	struct address_space *mapping = &inode->i_data;
> > +
> > +	if (mapping_has_private(mapping)) {
> >  		struct list_head *list = &mapping->private_list;
> >  		struct address_space *buffer_mapping = mapping->assoc_mapping;
> 
> I'ts not really helping much here as we still directly poke into the
> buffer_head list.

This is in fs/buffer.c.

Or do you object to the definition of mapping_has_private? Yes that
still checks the private_list, but it would be trivial to convert it
over to checking a bit in the mapping now. I just didn't do it because
fsblock also uses the private_list.
(Continue reading)

Nick Piggin | 1 Mar 2009 03:50
Picon

Re: [rfc][patch 2/5] fsblock: fsblock proper

On Sat, Feb 28, 2009 at 12:40:32PM +0100, Nick Piggin wrote:
> This is the core fsblock code. It also touches a few other little things which
> I should break out, but can basically be ignored.
> 
> Non-fsblock changes:
> fs-writeback.c, page-writeback.c, backing-dev.h: minor changes to support my
> bdflush flusher experiment (flushing data and metadata together based on bdev
> rather than pdflush looping over inodes etc, but this is disabled by default
> unless you uncomment BDFLUSH_FLUSHING in fsblock_types.h).
>  
> main.c: fsblock_init();
> 
> sysctl.c: sysctl disable fsblock freeing on 0 refcount. Just helps comparison.
> 
> truncate.c: should effectively be a noop... some leftover stuff to fix
>             superpage block truncation but it isn't quite finished.
> 
> page-flags.h: PageBlocks alias for PagePrivate, and some debugging stuff.

This seems to have been eaten by vger, so I'll attach a gzip.

Attachment (fsblock.patch.gz): application/x-gzip, 29 KiB
Balbir Singh | 1 Mar 2009 07:29
Picon

[PATCH 0/4] Memory controller soft limit patches (v3)


From: Balbir Singh <balbir <at> linux.vnet.ibm.com>

Changelog v3...v2
1. Implemented several review comments from Kosaki-San and Kamezawa-San
   Please see individual changelogs for changes

Changelog v2...v1
1. Soft limits now support hierarchies
2. Use spinlocks instead of mutexes for synchronization of the RB tree

Here is v3 of the new soft limit implementation. Soft limits is a new feature
for the memory resource controller, something similar has existed in the
group scheduler in the form of shares. The CPU controllers interpretation
of shares is very different though. 

Soft limits are the most useful feature to have for environments where
the administrator wants to overcommit the system, such that only on memory
contention do the limits become active. The current soft limits implementation
provides a soft_limit_in_bytes interface for the memory controller and not
for memory+swap controller. The implementation maintains an RB-Tree of groups
that exceed their soft limit and starts reclaiming from the group that
exceeds this limit by the maximum amount.

If there are no major objections to the patches, I would like to get them
included in -mm.

TODOs

1. The current implementation maintains the delta from the soft limit
(Continue reading)

Balbir Singh | 1 Mar 2009 07:30
Picon

[PATCH 1/4] Memory controller soft limit documentation (v3)

Add documentation for soft limit feature support.

From: Balbir Singh <balbir <at> linux.vnet.ibm.com>

Signed-off-by: Balbir Singh <balbir <at> linux.vnet.ibm.com>
---

 Documentation/cgroups/memory.txt |   27 ++++++++++++++++++++++++++-
 1 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index a98a7fe..812cb74 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
 <at>  <at>  -360,7 +360,32  <at>  <at>  cgroups created below it.

 NOTE2: This feature can be enabled/disabled per subtree.

-7. TODO
+7. Soft limits
+
+Soft limits allow for greater sharing of memory. The idea behind soft limits
+is to allow control groups to use as much of the memory as needed, provided
+
+a. There is no memory contention
+b. They do not exceed their hard limit
+
+When the system detects memory contention or low memory (kswapd is woken up)
+control groups are pushed back to their soft limits. If the soft limit of each
+control group is very high, they are pushed back as much as possible to make
(Continue reading)

Balbir Singh | 1 Mar 2009 07:30
Picon

[PATCH 2/4] Memory controller soft limit interface (v3)


From: Balbir Singh <balbir <at> linux.vnet.ibm.com>

Changelog v2...v1
1. Add support for res_counter_check_soft_limit_locked. This is used
   by the hierarchy code.

Add an interface to allow get/set of soft limits. Soft limits for memory plus
swap controller (memsw) is currently not supported. Resource counters have
been enhanced to support soft limits and new type RES_SOFT_LIMIT has been
added. Unlike hard limits, soft limits can be directly set and do not
need any reclaim or checks before setting them to a newer value.

Kamezawa-San raised a question as to whether soft limit should belong
to res_counter. Since all resources understand the basic concepts of
hard and soft limits, it is justified to add soft limits here. Soft limits
are a generic resource usage feature, even file system quotas support
soft limits.

Signed-off-by: Balbir Singh <balbir <at> linux.vnet.ibm.com>
---

 include/linux/res_counter.h |   47 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    3 +++
 mm/memcontrol.c             |   20 ++++++++++++++++++
 3 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index 4c5bcf6..b5f14fa 100644
--- a/include/linux/res_counter.h
(Continue reading)

Balbir Singh | 1 Mar 2009 07:30
Picon

[PATCH 3/4] Memory controller soft limit organize cgroups (v3)


From: Balbir Singh <balbir <at> linux.vnet.ibm.com>

Changelog v3...v2
1. Add only the ancestor to the RB-Tree
2. Use css_tryget/css_put instead of mem_cgroup_get/mem_cgroup_put

Changelog v2...v1
1. Add support for hierarchies
2. The res_counter that is highest in the hierarchy is returned on soft
   limit being exceeded. Since we do hierarchical reclaim and add all
   groups exceeding their soft limits, this approach seems to work well
   in practice.

This patch introduces a RB-Tree for storing memory cgroups that are over their
soft limit. The overall goal is to

1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
   We are careful about updates, updates take place only after a particular
   time interval has passed
2. We remove the node from the RB-Tree when the usage goes below the soft
   limit

The next set of patches will exploit the RB-Tree to get the group that is
over its soft limit by the largest amount and reclaim from it, when we
face memory contention.

Signed-off-by: Balbir Singh <balbir <at> linux.vnet.ibm.com>
---

(Continue reading)

Balbir Singh | 1 Mar 2009 07:30
Picon

[PATCH 4/4] Memory controller soft limit reclaim on contention (v3)


From: Balbir Singh <balbir <at> linux.vnet.ibm.com>

Changelog v3...v2
1. Convert several arguments to hierarchical reclaim to flags, thereby
   consolidating them
2. The reclaim for soft limits is now triggered from kswapd
3. try_to_free_mem_cgroup_pages() now accepts an optional zonelist argument

Changelog v2...v1
1. Added support for hierarchical soft limits

This patch allows reclaim from memory cgroups on contention (via the
kswapd() path) only if the order is 0.

memory cgroup soft limit reclaim finds the group that exceeds its soft limit
by the largest amount and reclaims pages from it and then reinserts the
cgroup into its correct place in the rbtree.

Signed-off-by: Balbir Singh <balbir <at> linux.vnet.ibm.com>
---

 include/linux/memcontrol.h |    2 +
 include/linux/swap.h       |    1 
 mm/memcontrol.c            |  134 ++++++++++++++++++++++++++++++++++++++------
 mm/vmscan.c                |    8 ++-
 4 files changed, 125 insertions(+), 20 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 18146c9..bf12451 100644
(Continue reading)

Dave Chinner | 1 Mar 2009 09:17

Re: [patch][rfc] mm: hold page lock over page_mkwrite

On Wed, Feb 25, 2009 at 10:36:29AM +0100, Nick Piggin wrote:
> I need this in fsblock because I am working to ensure filesystem metadata
> can be correctly allocated and refcounted. This means that page cleaning
> should not require memory allocation (to be really robust).

Which, unfortunately, is just a dream for any filesystem that uses
delayed allocation. i.e. they have to walk the free space trees
which may need to be read from disk and therefore require memory
to succeed....

Cheers,

Dave.
--

-- 
Dave Chinner
david <at> fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>


Gmane