Jesper Juhl | 1 Jul 01:16 2007
Picon

[PATCH][XFS][resend] fix memory leak in xfs_inactive()

(this is back from May 16 2007, resending since it doesn't look like 
the patch ever made it in anywhere)

The Coverity checker found a memory leak in xfs_inactive().

The offending code is this bit :

1671            tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);

At conditional (1): "truncate != 0" taking true path

1672            if (truncate) {
1673                    /*
1674                     * Do the xfs_itruncate_start() call before
1675                     * reserving any log space because itruncate_start
1676                     * will call into the buffer cache and we can't
1677                     * do that within a transaction.
1678                     */
1679                    xfs_ilock(ip, XFS_IOLOCK_EXCL);
1680
1681                    error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0);

At conditional (2): "error != 0" taking true path

1682                    if (error) {
1683                            xfs_iunlock(ip, XFS_IOLOCK_EXCL);

Event leaked_storage: Returned without freeing storage "tp"
Also see events: [alloc_fn][var_assign]

(Continue reading)

Jesper Juhl | 1 Jul 01:16 2007
Picon

[PATCH][XFS][resend] memory leak; allocated transaction not freed in xfs_inactive_free_eofblocks() in failure case.

(this is back from May 16 2007, resending since it doesn't look like 
the patch ever made it in anywhere)

Fix XFS memory leak; allocated transaction not freed in xfs_inactive_free_eofblocks() in failure case.

the code allocates a transaction, but in the case where 'truncate' is
!=0 and xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0); happens to return
an error, we'll just return from the function without dealing with the
memory allocated byxfs_trans_alloc() and assigned to 'tp', thus it'll be
orphaned/leaked - not good.

Signed-off-by: Jesper Juhl <jesper.juhl <at> gmail.com>
--- 
 fs/xfs/xfs_vnodeops.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index de17aed..32519cf 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
 <at>  <at>  -1260,6 +1260,7  <at>  <at>  xfs_inactive_free_eofblocks(
 		error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE,
 				    ip->i_size);
 		if (error) {
+			xfs_trans_cancel(tp, 0);
 			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 			return error;
 		}

(Continue reading)

David Chinner | 2 Jul 00:43 2007
Picon

Re: xfs_fsr, performance related tweaks

On Sat, Jun 30, 2007 at 01:17:29PM -0400, Eric Sandeen wrote:
> David Chinner wrote:
> > On Fri, Jun 29, 2007 at 08:16:28AM +0100, Just Marc wrote:
> >> David,
> >>
> >> In my first post I already said something like that can be done but it's 
> >> just an ugly hack.   Don't you think it would best be handled cleanly 
> >> and correctly by fsr itself?
> > 
> > No, I don't - if you want files not to be defragmented, then you
> > have to set the flags yourself in some way. You have a specific need
> > that can be solved by some scripting to describe your defrag/no
> > defrag policy. xfs_fsr has no place is setting defrag policy; it's
> > function is simply to find and defrag files.
> 
> I wouldn't mind seeing a way to tell fsr to not worry about defragging
> some files based on current layout; say if the avg extent in the file is
> > 100MB, or > 1G, don't bother... if today you have a 4.7G DVD iso image
> in 3 extents (not bad) fsr will try to "fix" it for you right?

That could be easily done with command line options, I think. Define
the minimum extent length or number of extents we want files to have
and ignore those that are outside that criteria.

Cheers,

Dave.
--

-- 
Dave Chinner
Principal Engineer
(Continue reading)

David Chinner | 2 Jul 00:55 2007
Picon

Re: [PATCH 4/7][TAKE5] support new modes in fallocate

On Sat, Jun 30, 2007 at 11:21:11AM +0100, Christoph Hellwig wrote:
> On Tue, Jun 26, 2007 at 04:02:47PM +0530, Amit K. Arora wrote:
> > > Can you clarify - what is the current behaviour when ENOSPC (or some other
> > > error) is hit?  Does it keep the current fallocate() or does it free it?
> > 
> > Currently it is left on the file system implementation. In ext4, we do
> > not undo preallocation if some error (say, ENOSPC) is hit. Hence it may
> > end up with partial (pre)allocation. This is inline with dd and
> > posix_fallocate, which also do not free the partially allocated space.
> 
> I can't find anything in the specification of posix_fallocate
> (http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html)
> that tells what should happen to allocate blocks on error.

Yeah, and AFAICT glibc leaves them behind ATM.

> But common sense would be to not leak disk space on failure of this
> syscall, and this definitively should not be left up to the filesystem,
> either we always leak it or always free it, and I'd strongly favour
> the latter variant.

We can't simply walk the range an remove unwritten extents, as some
of them may have been present before the fallocate() call. That
makes it extremely difficult to undo a failed call and not remove
more pre-existing pre-allocations.

Given the current behaviour for posix_fallocate() in glibc, I think
that retaining the same error semantic and punting the cleanup to
userspace (where the app will fail with ENOSPC anyway) is the only
sane thing we can do here. Trying to undo this in the kernel leads
(Continue reading)

Michal Marek | 2 Jul 11:40 2007
Picon

Re: [patch 3/3] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} and XFS_IOC_FSINUMBERS in compat mode

David Chinner wrote:
> On Tue, Jun 19, 2007 at 03:25:52PM +0200, mmarek <at> suse.cz wrote:
>> +static int xfs_bulkstat_one_compat(
>> +	xfs_mount_t	*mp,		/* mount point for filesystem */
>> +	xfs_ino_t	ino,		/* inode number to get data for */
>> +	void		__user *buffer,	/* buffer to place output in */
>> +	int		ubsize,		/* size of buffer */
>> +	void		*private_data,	/* my private data */
>> +	xfs_daddr_t	bno,		/* starting bno of inode cluster */
>> +	int		*ubused,	/* bytes used by me */
>> +	void		*dibuff,	/* on-disk inode buffer */
>> +	int		*stat)		/* BULKSTAT_RV_... */
> 
> Hmmm - this is almost all duplicated code. It's pretty much what I
> described, but maybe not /quite/ what I had in mind here. It's a
> *big* improvement on the first version, but it seems now that the
> only real difference xfs_bulkstat_one() and
> xfs_bulkstat_one_compat() is copy_to_user() vs the discrete put_user
> calls.
> 
> I think we can remove xfs_bulkstat_one_compat() completely by using
> the same method you used with the xfs_inumber_fmt functions.

You mean xfs_ioc_bulkstat_compat() -> native xfs_bulkstat() -> native
xfs_bulkstat_one() -> a compat output formatter, so a
pointer-to-function passed to pointer-to-function? ;) I could (ab)use
the void *private_data arg which xfs_bulkstat passes unmodified to the
formatter; xfs_bulkstat_one() doesn't make use of it atm. I'll try it
and post the result in a while.

(Continue reading)

Tejun Heo | 2 Jul 12:56 2007
Picon

Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

David Greaves wrote:
>> Tejun Heo wrote:
>>> It's really weird tho.  The PHY RDY status changed events are coming
>>> from the device which is NOT used while resuming
> 
> There is an obvious problem there though Tejun (the errors even when sda
> isn't involved in the OS boot) - can I start another thread about that
> issue/bug later? I need to reshuffle partitions so I'd rather get the
> hibernate working first and then go back to it if that's OK?

Yeah, sure.  The problem is that we don't know whether or how those two
are related.  It would be great if there's a way to verify memory image
read from hibernation is intact.  Rafael, any ideas?

Thanks.

--

-- 
tejun

Amit K. Arora | 2 Jul 13:47 2007
Picon

Re: [PATCH 4/7][TAKE5] support new modes in fallocate

On Mon, Jul 02, 2007 at 08:55:43AM +1000, David Chinner wrote:
> On Sat, Jun 30, 2007 at 11:21:11AM +0100, Christoph Hellwig wrote:
> > On Tue, Jun 26, 2007 at 04:02:47PM +0530, Amit K. Arora wrote:
> > > > Can you clarify - what is the current behaviour when ENOSPC (or some other
> > > > error) is hit?  Does it keep the current fallocate() or does it free it?
> > > 
> > > Currently it is left on the file system implementation. In ext4, we do
> > > not undo preallocation if some error (say, ENOSPC) is hit. Hence it may
> > > end up with partial (pre)allocation. This is inline with dd and
> > > posix_fallocate, which also do not free the partially allocated space.
> > 
> > I can't find anything in the specification of posix_fallocate
> > (http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html)
> > that tells what should happen to allocate blocks on error.
> 
> Yeah, and AFAICT glibc leaves them behind ATM.

Yes, it does.

> > But common sense would be to not leak disk space on failure of this
> > syscall, and this definitively should not be left up to the filesystem,
> > either we always leak it or always free it, and I'd strongly favour
> > the latter variant.

I would not call it a "leak", since the blocks which got allocated as
part of the partial success of the fallocate syscall can be strictly
accounted for (i.e. they are assigned to a particular inode). And these
can be freed by the application, using a suitable  <at> mode of fallocate.

> We can't simply walk the range an remove unwritten extents, as some
(Continue reading)

David Chinner | 2 Jul 15:01 2007
Picon

Re: After reboot fs with barrier faster deletes then fs with nobarrier

On Fri, Jun 29, 2007 at 01:01:36PM +0100, Szabolcs Illes wrote:
> on my desktop pc:
> 
> WCE=1

hot-cache/cold cache nobarrier is faster. same results as I got.
After reboot, nobarrier is way slow.

> WCE=0

hot-cache/cold-cache is pretty much identical. Same results I got.
After reboot, nobarrier is way slow.

> for cold cache I used: echo 3 > /proc/sys/vm/drop_caches

Same here.

> it looks like this machine is only affected after reboot, maybe the hdd  
> has more cache, then the hdd in my 3 years old laptop.
> on my laptop it was enought to clear the kernel cache.
> 
> How did you do your "cold" tests? reboot or drop_caches?

drop_caches.

So there definitely appears to be something about a reboot that is
causing an issue here. I'll see if I can reproduce it here.

Cheers,

(Continue reading)

Rafael J. Wysocki | 2 Jul 16:08 2007
Picon

Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

On Monday, 2 July 2007 12:56, Tejun Heo wrote:
> David Greaves wrote:
> >> Tejun Heo wrote:
> >>> It's really weird tho.  The PHY RDY status changed events are coming
> >>> from the device which is NOT used while resuming
> > 
> > There is an obvious problem there though Tejun (the errors even when sda
> > isn't involved in the OS boot) - can I start another thread about that
> > issue/bug later? I need to reshuffle partitions so I'd rather get the
> > hibernate working first and then go back to it if that's OK?
> 
> Yeah, sure.  The problem is that we don't know whether or how those two
> are related.  It would be great if there's a way to verify memory image
> read from hibernation is intact.  Rafael, any ideas?

Well, s2disk has an option to compute an MD5 checksum of the image during
the hibernation and verify it while reading the image.  Still, s2disk/resume
aren't very easy to install  and configure ...

Greetings,
Rafael

--

-- 
"Premature optimization is the root of all evil." - Donald Knuth

David Greaves | 2 Jul 16:32 2007

Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

Rafael J. Wysocki wrote:
> On Monday, 2 July 2007 12:56, Tejun Heo wrote:
>> David Greaves wrote:
>>>> Tejun Heo wrote:
>>>>> It's really weird tho.  The PHY RDY status changed events are coming
>>>>> from the device which is NOT used while resuming
>>> There is an obvious problem there though Tejun (the errors even when sda
>>> isn't involved in the OS boot) - can I start another thread about that
>>> issue/bug later? I need to reshuffle partitions so I'd rather get the
>>> hibernate working first and then go back to it if that's OK?
>> Yeah, sure.  The problem is that we don't know whether or how those two
>> are related.  It would be great if there's a way to verify memory image
>> read from hibernation is intact.  Rafael, any ideas?
> 
> Well, s2disk has an option to compute an MD5 checksum of the image during
> the hibernation and verify it while reading the image.
(Assuming you mean the mainline version)

Sounds like a good think to try next...
Couldn't see anything on this in ../Documentation/power/*
How do I enable it?

>  Still, s2disk/resume
> aren't very easy to install  and configure ...

I have it working fine on 2 other machines now so that doesn't appear to be a 
problem.

David

(Continue reading)


Gmane