Dave Chinner | 1 Apr 2010 01:23

Re: [PATCH] quota: add per-inode reservaton space sanity checks.

On Wed, Mar 31, 2010 at 11:17:39AM +0400, dmonakhov <at> openvz.org wrote:
> Dave Chinner <david <at> fromorbit.com> writes:
> > On Wed, Mar 31, 2010 at 09:20:17AM +0400, Dmitry Monakhov wrote:
> >> BTW: I've attached my testcase. I hope it will be useful for you.
> >> It able to catch quota inconsistency caused by incorrect symlink
> >> handling, but it is not reliable for writepage/fallocate bug in ext4.
....
> > There's already a "_require_quota()" function in common.quota that
> Yep. overlooked this one.
> > checks if the filesystem being tested supports quotas and that the
> > quota tools are installed. Can you add these checks to that
> > function?
> >
> > _require_quota also calls _notrun directly, so no need for the
> > quota_supported variable, either.
> >
> > Also, can you use 8 space tabs for indenting?
> Ok, will redo accruing to all your comments. To make the testcase more
> useful i want to perform grep dmesg. But currently  this technique
> is not used in xfs-testcase.

It hasn't been used because in the past kernel output is has not been
needed to report a test success or fail. If the test fails, and
there's pertinent infomration in the kernel log, then normally the
developer grabs that him/herself after the failure.

> How can i do it in a convenient way?

What information do you want to grab from the kernel log?  If you
make the test linux platform specific (IIRC you already have), then
(Continue reading)

Nicholas A. Bellinger | 1 Apr 2010 08:23

[LSF/VM TOPIC] TCM v4.0 generic configfs fabric infrastructure

Greetings all,

[Topic]
TCM v4.0 generic configfs fabric infrastructure

[Abstract]
This discussion focuses on the generic configfs fabric module
infrastructure WIP currently under development for TCM v4.0 which aims
to provide a generic set of struct config_item_types available for
target mode fabric module configfs usage, and set of context specific
macros available for adding target mode fabric module dependent RW
configfs attributes.  The main goal is to enable the development of new
and conversion of existing target mode fabric modules without the hard
requirement of fabric module authors needing to implement their own
configfs code.

The patch announcements for the first and second series of TCM code and
LIO-Target and TCM_Loop fabric module conversion commits into
lio-core-2.6.git/lio-4.0 can be found here:

http://marc.info/?l=linux-scsi&m=126688745100913&w=2
http://marc.info/?l=linux-scsi&m=126710672402056&w=2

and code for implementing the generic WWN, TPGT, LUN, NodeACL and
MappedLUN struct configs_item_types is available here:

http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=blob;f=drivers/target/target_core_fabric_configfs.c;hb=lio-4.0

Best,

(Continue reading)

Jan Kara | 1 Apr 2010 11:12
Picon

Re: [PATCH 2/3] ext4: journalled quota optimization

  Hi,

On Sat 27-03-10 15:15:39, Dmitry Monakhov wrote:
> Currently each quota modification result in write_dquot()
> and later dquot_commit().  This means what each quota modification
> function must wait for dqio_mutex. Which is *huge* performance
> penalty on big SMP systems. ASAIU The core idea of this implementation
> is to guarantee that each quota modification will be written to journal
> atomically. But in fact this is not always true, because dquot may be
> changed after dquot modification, but before it was committed to disk.
  We were already discussing this when you've last submitted the patch.
dqio_mutex has nothing to do with journaling. It is there so that two
writes to quota file cannot happen in parallel because that could cause
corruption. Without dqio_mutex, the following would be possible:
  Task 1				Task 2
...
  qtree_write_dquot()
    ...
    info->dqi_ops->mem2disk_dqblk
					modify dquot
					mark_dquot_dirty
					...
					qtree_write_dquot()
					  - writes newer information
    ret = sb->s_op->quota_write
      - overwrites the new information
        with an old one.

>  | Task 1                           | Task 2                      |
>  | alloc_space(nr)                  |                             |
(Continue reading)

Denys Fedorysychenko | 1 Apr 2010 12:42

Re: endless sync on bdi_sched_wait()? 2.6.33.1

On Thursday 01 April 2010 01:12:54 Dave Chinner wrote:
> On Wed, Mar 31, 2010 at 07:07:31PM +0300, Denys Fedorysychenko wrote:
> > I have a proxy server with "loaded" squid. On some moment i did sync, and
> > expecting it to finish in reasonable time. Waited more than 30 minutes,
> > still "sync". Can be reproduced easily.
> >
> > Here is some stats and info:
> >
> > Linux SUPERPROXY 2.6.33.1-build-0051 #16 SMP Wed Mar 31 17:23:28 EEST
> > 2010 i686 GNU/Linux
> >
> > SUPERPROXY ~ # iostat -k -x -d 30
> > Linux 2.6.33.1-build-0051 (SUPERPROXY)  03/31/10        _i686_  (4 CPU)
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> > avgrq-sz avgqu-sz   await  svctm  %util
> > sda               0.16     0.01    0.08    0.03     3.62     1.33   
> > 88.94 0.15 1389.89  59.15   0.66
> > sdb               4.14    61.25    6.22   25.55    44.52   347.21   
> > 24.66 2.24   70.60   2.36   7.49
> > sdc               4.37   421.28    9.95   98.31   318.27  2081.95   
> > 44.34 20.93  193.21   2.31  24.96
> > sdd               2.34   339.90    3.97  117.47    95.48  1829.52   
> > 31.70 1.73   14.23   8.09  98.20
> 
>                  ^^^^  ^^^^^
> 
> /dev/sdd is IO bound doing small random writeback IO. A service time
> of 8ms implies that it is doing lots of large seeks. If you've got
> GBs of data to sync and that's the writeback pattern, then sync will
(Continue reading)

Dave Chinner | 1 Apr 2010 13:13

Re: endless sync on bdi_sched_wait()? 2.6.33.1

On Thu, Apr 01, 2010 at 01:42:42PM +0300, Denys Fedorysychenko wrote:
> Thats correct, it is quite busy cache server.
> 
> Well, if i stop squid(cache) sync will finish enough fast.
> If i don't - it took more than hour. Actually i left that PC after 1 hour, and 
> it didn't finish yet. I don't think it is normal.
> Probably sync taking new data and trying to flush it too, and till he finish 
> that, more data comes. 
> Actually all what i need - to sync config directory. I cannot use fsync, 
> because it is multiple files opened before by other processes, and sync is 
> doing trick like this. I got dead process, and only fast way to recover system 
> - kill the cache process, so I/O pumping will stop for a while, and sync() 
> will have chance to finish.
> Sure there is way just to "remount" config partition to ro, but i guess just 
> sync must flush only current buffer cache pages.
> 
> I will do more tests now and will give exact numbers, how much time it needs 
> with running squid and if i kill it shortly after running sync.

Ok. What would be interesting is regular output from /proc/meminfo
to see how the dirty memory is changing over the time the sync is
running....

Cheers,

Dave.
--

-- 
Dave Chinner
david <at> fromorbit.com
--
(Continue reading)

Michal Suchanek | 1 Apr 2010 17:36
Picon
Favicon

Re: UnionMount status?

On 24 March 2010 00:02, Valerie Aurora <vaurora <at> redhat.com> wrote:
> On Fri, Mar 19, 2010 at 10:47:15PM +0100, Michal Suchanek wrote:
>> Hello
>>
>> On 19 March 2010 19:03, Valerie Aurora <vaurora <at> redhat.com> wrote:
>>
>> >
>> > Where union mounts is right now is in need of more review from VFS
>> > experts (and thanks to those who have already reviewed it). ??I'm
>>
>> I don't count myself among VFS experts so I'm sorry if I am restating
>> or missing something obvious.
>
> Thanks for taking a look!

Thanks for taking the time to reply.

Apparently I have missed a few properties of the current state of
union mount, especially the fact that directories are only ever stored
in the top layer and the bottom directories are only accessed once
when the merged directory is created in the top layer.

It greatly simplifies lookup operations and avoids some problems with
getting stuck in the bottom layer when something else is already
visible on the top layer.

>
>> > rewriting the in-file copyup code right now, which is dependent on a
>> > lot of ongoing VFS work by Al Viro, Nick Piggin, Dmitriy Monakhov, and
>> > others. ??Here's my description of the problem I'm currently working,
(Continue reading)

Jeff Moyer | 1 Apr 2010 22:14
Picon
Favicon

Re: endless sync on bdi_sched_wait()? 2.6.33.1

Dave Chinner <david <at> fromorbit.com> writes:

> On Thu, Apr 01, 2010 at 01:42:42PM +0300, Denys Fedorysychenko wrote:
>> Thats correct, it is quite busy cache server.
>> 
>> Well, if i stop squid(cache) sync will finish enough fast.
>> If i don't - it took more than hour. Actually i left that PC after 1 hour, and 
>> it didn't finish yet. I don't think it is normal.
>> Probably sync taking new data and trying to flush it too, and till he finish 
>> that, more data comes. 
>> Actually all what i need - to sync config directory. I cannot use fsync, 
>> because it is multiple files opened before by other processes, and sync is 
>> doing trick like this. I got dead process, and only fast way to recover system 
>> - kill the cache process, so I/O pumping will stop for a while, and sync() 
>> will have chance to finish.
>> Sure there is way just to "remount" config partition to ro, but i guess just 
>> sync must flush only current buffer cache pages.
>> 
>> I will do more tests now and will give exact numbers, how much time it needs 
>> with running squid and if i kill it shortly after running sync.
>
> Ok. What would be interesting is regular output from /proc/meminfo
> to see how the dirty memory is changing over the time the sync is
> running....

This sounds familiar:

http://lkml.org/lkml/2010/2/12/41

Cheers,
(Continue reading)

Serge E. Hallyn | 2 Apr 2010 01:37
Picon
Favicon

Re: [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20

Quoting Oren Laadan (orenl <at> cs.columbia.edu):
> Hi Andrew,
> 
> Following up on the thread on the checkpoint-restart patch set
> (http://lkml.org/lkml/2010/3/1/422), the following series is the
> latest checkpoint/restart, based on 2.6.33.
> 
> The first 20 patches are cleanups and prepartion for c/r; they
> are followed by the actual c/r code.
> 
> Please apply to -mm, and let us know if there is any way we can
> help.

Hi Andrew,

Oren sent v20 of the checkpoint/restart patchset out two weeks ago.
We've addressed some feedback from linux-fsdevel and added network and
pid namespace support.  So we could resend again now.  However we also
have a bigger patchset in the works which is feature-neutral, but moves
all the code out of linux-2.6/checkpoint/ and next to the code it
affects.  I ancitipate #ifdef clashes though, so we'll
need to do quite a bit of various-config-and-arch testing of the new
code layout.  If you're at a good point to pull it, we can resend the
code as is now so as to get some wider testing exposure. Or, if you prefer,
we can wait until after the code move in case that would be seen as more
amenable to meaningful review.

We don't want to patch-bomb needlessly so thought we'd ask first :)

thanks,
(Continue reading)

tim | 2 Apr 2010 04:24
Picon

Re: [patch 1/2] fs: cleanup files_lock

Nick Piggin wrote:
> I would like to start sending bits of vfs scalability improvements to
> be reviewed and hopefully merged.
>
> I will start with files_lock. Last time this one came up, it was
> criticised because some hoped to get rid of files list, and because
> it didn't have enough justification of the scalability improvement.
>
> For the first criticism, it isn't any more difficult to rip out if
> we are ever able to remove files list. For the second, I have gathered
> some statistics and written better justification. Andi I believe is
> finding kbuild is becoming limited by files lock on larger systems.

I did some testing doing multi-threaded kernel compile 
(skipping link stage) on a 4 socket Nehalem-EX class 
machines with 8 cores per socket.
In the past we have found 
heavy contention on the files_lock. Nick's patch to 
reorganize files_lock is beneficial.

With  64 threads compile, the distribution of user, 
system, idle, IO wait time improved and the
details are as follow:

                us      sys     idle    IO wait (in %)  
2.6.34-rc2      51.25   28.25   17.25   3.25      
+nick's patch   53.75   18.5    19      8.75    

We spend 10% less cpu on system time contending for files_lock.
Contention of files_lock is gone from our profiling data.
(Continue reading)

Huang Shijie | 2 Apr 2010 11:37
Picon

[PATCH] namei.c : update mnt when it needed

update the mnt of the path when it is not equal to the new one.

Signed-off-by: Huang Shijie <shijie8 <at> gmail.com>
---
 fs/namei.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..9c3a040 100644
--- a/fs/namei.c
+++ b/fs/namei.c
 <at>  <at>  -523,9 +523,10  <at>  <at>  static void path_put_conditional(struct path *path, struct nameidata *nd)
 static inline void path_to_nameidata(struct path *path, struct nameidata *nd)
 {
 	dput(nd->path.dentry);
-	if (nd->path.mnt != path->mnt)
+	if (nd->path.mnt != path->mnt) {
 		mntput(nd->path.mnt);
-	nd->path.mnt = path->mnt;
+		nd->path.mnt = path->mnt;
+	}
 	nd->path.dentry = path->dentry;
 }

--

-- 
1.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)


Gmane