Minchan Kim | 1 Oct 09:10 2011

Re: [patch 1/5] mm: exclude reserved pages from dirtyable memory

On Fri, Sep 30, 2011 at 09:17:20AM +0200, Johannes Weiner wrote:
> The amount of dirtyable pages should not include the full number of
> free pages: there is a number of reserved pages that the page
> allocator and kswapd always try to keep free.
> The closer (reclaimable pages - dirty pages) is to the number of
> reserved pages, the more likely it becomes for reclaim to run into
> dirty pages:
>        +----------+ ---
>        |   anon   |  |
>        +----------+  |
>        |          |  |
>        |          |  -- dirty limit new    -- flusher new
>        |   file   |  |                     |
>        |          |  |                     |
>        |          |  -- dirty limit old    -- flusher old
>        |          |                        |
>        +----------+                       --- reclaim
>        | reserved |
>        +----------+
>        |  kernel  |
>        +----------+
> This patch introduces a per-zone dirty reserve that takes both the
> lowmem reserve as well as the high watermark of the zone into account,
> and a global sum of those per-zone values that is subtracted from the
> global amount of dirtyable pages.  The lowmem reserve is unavailable
> to page cache allocations and kswapd tries to keep the high watermark
> free.  We don't want to end up in a situation where reclaim has to
(Continue reading)

Joakim Tjernlund | 1 Oct 16:02 2011

Re: [PATCH v4] crc32c: Implement CRC32c with slicing-by-8 algorithm

"Darrick J. Wong" <djwong <at> us.ibm.com> wrote on 2011/09/30 21:29:56:
> The existing CRC32c implementation uses Sarwate's algorithm to calculate the
> code one byte at a time.  Using a slicing-by-8 algorithm adapted from Bob
> Pearson, we can process buffers 8 bytes at a time, for a substantial increase
> in performance.
> The motivation for this patchset is that I am working on adding full metadata
> checksumming to ext4 and jbd2.  As far as performance impact of adding
> checksumming goes, I see nearly no change with a standard mail server ffsb
> simulation.  On a test that involves only metadata operations (file creation
> and deletion, and fallocate/truncate), I see a drop of about 50 pcercent with
> the current kernel crc32c implementation; this improves to a drop of about 20
> percent with the enclosed crc32c code.
> When metadata is usually a small fraction of total IO, this new implementation
> doesn't help much because metadata is usually a small fraction of total IO.
> However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
> tree), then this patch speeds up the operation substantially.
> Given that iscsi, sctp, and btrfs also use crc32c, this patchset should improve
> their speed as well.  I have some preliminary results[1] that show the
> difference in various crc algorithms that I've come across: the "crc32c-by8-le"
> column is the new algorithm in the patch; the "crc32c" column is the current
> crc32c kernel implementation; and the "crc32-kern-le" column is the current
> crc32 kernel implementation, which is similar to the results one gets for
> CONFIG_CRC32C_SLICEBY4=y.  As you can see, the new implementation runs at
> nearly 4x the speed of the current implementation; even the slimmer slice-by-4
> implementation is generally 2-3x faster.
(Continue reading)

LuVar | 1 Oct 17:19 2011

Re: [GIT] Bcache version 12

Hi here.

----- "Dan J Williams" <dan.j.williams <at> intel.com> wrote:

> On Fri, Sep 30, 2011 at 12:14 AM, Kent Overstreet
> <kent.overstreet <at> gmail.com> wrote:
> >> > Cache devices have a basically identical superblock as backing
> devices
> >> > though, and some of the registration code is shared, but cache
> devices
> >> > don't correspond to any block devices.
> >>
> >> Just like a raid0 is a virtual creation from two block devices?
>  Or
> >> some other meaning of "don't correspond"?
> >
> > No.
> >
> > Remember, you can hang multiple backing devices off a cache.
> >
> > Each backing device shows up as as a new block device - i.e. if
> you're
> > caching /dev/sdb, you now use it as /dev/bcache0.
> >
> > But the SSD doesn't belong to any of those /dev/bcacheN devices.
> So to clarify I read that as "it belongs to all of them".  The ssd
> (/dev/sda, for example) can cache the contents of N block devices,
> and
> to get to the cached version of each of those you go through
(Continue reading)

Andres Freund | 1 Oct 22:46 2011

Re: Improve lseek scalability v3


On Friday, September 16, 2011 01:06:46 AM Andi Kleen wrote:
> v3: No changes, except rebase. All reviews passed. Just reposting
> for merging.
Is anything/anyone still objecting to this patchset?

I just retested it ontop of v3.1-rc8 minus the btrfs parts (which don't apply 
cleanly anymore because a modified version of 1/7 was merged) and it works 
fine for some hours of fs heavy db using benchmarking/development.

Following is a seemingly trivial forward-port of 7/7. But since I have 
about no clue in fs development and even less about brfts - which I never used -
take it with a grain of salt.
It seems a bit ugly to have the mutex_unlock at three places btw. A 2nd patch
fixes that, no idea whether its worth the churn.
Both are compile tested only.

Even at this (2 x E5520 (4 cores)) machine there seems to be a benefit of 
about 1.5%. Not enough cores to get into the actually problematic performance 
areas as presented by Robert though.
The variance between runs is a bit too high to call it reliable though.



PS: I have no clue what to do with the s-o-b and changelog when forward 
porting a patch... So I just copied the original message - which seems wrong.

(Continue reading)

Andres Freund | 1 Oct 22:49 2011

[PATCH 1/2] LSEEK: BTRFS: Avoid i_mutex for SEEK_{CUR,SET,END}

Don't need the i_mutex for those cases, only for SEEK_HOLE/DATA.

Really-From: Andi Kleen <ak <at> linux.intel.com>
Signed-off-by: Andi Kleen <ak <at> linux.intel.com>
Signed-off-by: Andres Freund <andres <at> anarazel.de>
 fs/btrfs/file.c |   27 +++++++++++----------------
 1 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7a13337..5bc7116 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
 <at>  <at>  -1809,24 +1809,19  <at>  <at>  static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)
 	struct inode *inode = file->f_mapping->host;
 	int ret;

+	if (origin != SEEK_DATA && origin != SEEK_HOLE)
+		return generic_file_llseek(file, offset, origin);
-	switch (origin) {
-	case SEEK_END:
-	case SEEK_CUR:
-		offset = generic_file_llseek(file, offset, origin);
-		goto out;
-	case SEEK_DATA:
-	case SEEK_HOLE:
-		if (offset >= i_size_read(inode)) {
(Continue reading)

Andres Freund | 1 Oct 22:50 2011

[PATCH 2/2] btrfs: Don't have multiple paths to error out in btrfs_file_llseek

Using multiple paths seems to invite overlooking one when adding new
stuff in the future.

Signed-of-by: Andres Freund <andres <at> anarazel.de>
 fs/btrfs/file.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5bc7116..701c633 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
 <at>  <at>  -1814,14 +1814,14  <at>  <at>  static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)

 	if (offset >= i_size_read(inode)) {
-		mutex_unlock(&inode->i_mutex);
-		return -ENXIO;
+		offset = -ENXIO;
+		goto out;

 	ret = find_desired_extent(inode, &offset, origin);
 	if (ret) {
-		mutex_unlock(&inode->i_mutex);
-		return ret;
+		offset = ret;
+		goto out;

(Continue reading)

Andi Kleen | 2 Oct 07:28 2011

Re: Improve lseek scalability v3

Thanks for testing. According to Viro the patchkit is in his queue, so hopefully
next merge window.

To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Walls | 2 Oct 14:13 2011

Re: [RFCv4 PATCH 2/6] ivtv: only start streaming in poll() if polling for input.

On Thu, 2011-09-29 at 09:44 +0200, Hans Verkuil wrote:
> From: Hans Verkuil <hans.verkuil <at> cisco.com>
> Signed-off-by: Hans Verkuil <hans.verkuil <at> cisco.com>


Acked-by: Andy Walls <awalls <at> md.metrocast.net>


> ---
>  drivers/media/video/ivtv/ivtv-fileops.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> diff --git a/drivers/media/video/ivtv/ivtv-fileops.c b/drivers/media/video/ivtv/ivtv-fileops.c
> index 38f0522..a931ecf 100644
> --- a/drivers/media/video/ivtv/ivtv-fileops.c
> +++ b/drivers/media/video/ivtv/ivtv-fileops.c
>  <at>  <at>  -744,8 +744,9  <at>  <at>  unsigned int ivtv_v4l2_dec_poll(struct file *filp, poll_table *wait)
>  	return res;
>  }
> -unsigned int ivtv_v4l2_enc_poll(struct file *filp, poll_table * wait)
> +unsigned int ivtv_v4l2_enc_poll(struct file *filp, poll_table *wait)
>  {
> +	unsigned long req_events = poll_requested_events(wait);
>  	struct ivtv_open_id *id = fh2id(filp->private_data);
>  	struct ivtv *itv = id->itv;
>  	struct ivtv_stream *s = &itv->streams[id->type];
(Continue reading)

Dmitry Monakhov | 2 Oct 21:44 2011

[PATCH 1/6] RFC: introduce extended inode owner identifier v8

Hi, I've updated old standing project quota id patch-set.
Please take a look at it and replay me your opinion, especially if you
do not like it by some reason :) This is really important because i do
want this feature to be merged sooner or later.

*Feature description*
1) Inode may has a project identifier which has same meaning as uid/gid.
2) Id is stored in inode's xattr named "system.project_id"
3) Id is inherent from parent inode on creation.
4) This id is cached in memory fs_inode structure and may be accessible
   via s_op->get_prjid(). This field it restricted by CONFIG_PROJECT_ID.
   So no wasting of memory happens.

5) Since id is cached in memory it may be used for different purposes
   such as:
5A) Implement additional quota id space orthogonal to uid/gid. This is
    useful in managing quota for some filesystem hierarchy(chroot or
    container over bindmount)

*User interface *
Project id is managed via generic xattr interface "system.project_id"
This good because
 1) We may use already existing interface.
 2) xattr already supported by generic utilities tar/rsync and etc

1) generic projectid support
2) generic project quota support
3) ext4: small mount flags cleanup
4) ext4 project support implementation
(Continue reading)

Dmitry Monakhov | 2 Oct 21:44 2011

[PATCH 2/6] Add additional owner identifier

This patch add project inode identifier. Project ID may be used as
auxiliary owner specifier in addition to standard uid/gid.

Signed-off-by: Dmitry Monakhov <dmonakhov <at> openvz.org>
 fs/Kconfig            |    7 +++++++
 include/linux/fs.h    |    1 +
 include/linux/xattr.h |    3 +++
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 5f4c45d..f3f1b12 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
 <at>  <at>  -56,6 +56,13  <at>  <at>  config FILE_LOCKING
 	  This option enables standard file locking support, required
           for filesystems like NFS and for the flock() system
           call. Disabling this option saves about 11k.
+config PROJECT_ID
+	bool "Enable project inode identifier"
+	default y
+	help
+	  This option enables project inode identifier. Project ID
+	  may be used as auxiliary owner specifier in addition to
+	  standard uid/gid.

 source "fs/notify/Kconfig"

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e7b118c..9059ad4 100644
(Continue reading)