Jan Hudec | 1 Dec 2004 08:16
Picon

Re: [PATCH] [Request for inclusion] Filesystem in Userspace

On Tue, Nov 30, 2004 at 22:13:27 +0100, Miklos Szeredi wrote:
> 
> > you're describing the deadlock here: all memory is full, no process 
> > which allocates memory can make any progress.
> 
> Yes they, can: the allocation will fail, function will return -ENOMEM,
> malloc will return NULL, pagefault will fail with OOM.  This is
> progress, though not the best sort.  It is most certainly _not_ a
> deadlock.

Allocation won't fail! There's overcommit! Pagefault won't OOM, because
it will wait for the pages to get laundered. And the pages won't get
laundered untill the pagefault suceeds. (Yes, I know that you are going
to mark the pages as dirty again so the pagefault won't wait for them,
but you have to mention it.)

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb <at> ucw.cz>
Miklos Szeredi | 1 Dec 2004 14:35
Picon

Re: [PATCH] [Request for inclusion] Filesystem in Userspace


> > Yes they, can: the allocation will fail, function will return -ENOMEM,
> > malloc will return NULL, pagefault will fail with OOM.  This is
> > progress, though not the best sort.  It is most certainly _not_ a
> > deadlock.
> 
> Allocation won't fail! There's overcommit! Pagefault won't OOM, because
> it will wait for the pages to get laundered. And the pages won't get
> laundered untill the pagefault suceeds. (Yes, I know that you are going
> to mark the pages as dirty again so the pagefault won't wait for them,
> but you have to mention it.)

You didn't read the thread.  I was talking about the page not being
counted as dirty in the first place (bdi->memory_backed = 1).

If you want to see a machine out of physical memory (you can have
plenty of free swap), just try filling up a ramfs filesystem.  Don't
do it on the company's mission critical server though, cause some
people might be unhappy afterwards.

I tried it, and it's not very nice.  Even the OOM killer went to work
though swap was far from full.  And the end result was a perfectly
responsive, but not very useful system.

So please don't try to tell me that:

  a) it will deadlock: it won't, not even if userspace calls back,
  because the memory is _not_ reclaimable

  b) it's not a good solution: I _know_, all I'm trying to show that a
(Continue reading)

Steve French | 2 Dec 2004 02:01
Picon
Favicon

O_DIRECT

Is there a precise defintion of the Linux O_DIRECT  semantics, in 
particularly addressing desired behavior in the cases of:
1) inodes opened more than once (some with and some without O_DIRECT) - 
do the other open file instances get disabled caching?
2) consistency of mmap data and sendfile data (how could this work 
without write through of the page cache?) when inode is opened O_DIRECT

If the goal of O_DIRECT is not only to bypass the local client's page 
cache, but also to bypass the 4K read/write page size (and allow larger 
read, writes), is it acceptable for an fs to handle files opened 
O_DIRECT to disable caching by turning off calls to generic_file_read 
and generic_file_write for reads and writes to that inode (after a 
sync)?  If not, is there a conventional "no cache" mount option used by 
other filesystems to do the equivalent on a mounted volume?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lever, Charles | 2 Dec 2004 03:50
Picon

RE: O_DIRECT

steve-

see the NFS implementation of O_DIRECT in 2.6.  fs/nfs/{direct,file}.c

i use a modified version of "fsx" that does O_DIRECT standard I/O along
with mmap'd I/O through the page cache to test the NFS O_DIRECT
implementation.  each file descriptor is either cached or direct.

> -----Original Message-----
> From: Steve French [mailto:smfltc <at> us.ibm.com] 
> Sent: Wednesday, December 01, 2004 8:01 PM
> To: linux-fsdevel <at> vger.kernel.org
> Subject: O_DIRECT
> 
> 
> Is there a precise defintion of the Linux O_DIRECT  semantics, in 
> particularly addressing desired behavior in the cases of:
> 1) inodes opened more than once (some with and some without 
> O_DIRECT) - 
> do the other open file instances get disabled caching?
> 2) consistency of mmap data and sendfile data (how could this work 
> without write through of the page cache?) when inode is 
> opened O_DIRECT
> 
> If the goal of O_DIRECT is not only to bypass the local client's page 
> cache, but also to bypass the 4K read/write page size (and 
> allow larger 
> read, writes), is it acceptable for an fs to handle files opened 
> O_DIRECT to disable caching by turning off calls to generic_file_read 
> and generic_file_write for reads and writes to that inode (after a 
(Continue reading)

Shaya Potter | 2 Dec 2004 04:12

badly authored udf file systems

I have a couple of DVDs that seem to have been authored badly, such that
when they are mounted, none of the directories have the execute bit, and
hence means that only root can enter the VIDEO_TS folder and hence only
root can play the dvd.

in looking at the ecma spec
(http://www.ecma-international.org/publications/standards/Ecma-167.htm)
and section 4/14.9 where the file entry record is discussed, it shows
that the permission scheme sort of mirrors unix (i.e. owner, group,
other and read/write/execute bits).  However, the spec is ambigious
because when it refers to the execute bit, it doesn't talk about
directories at all.  In normal unix, of course one needs the execute bit
set, however, its probable other systems dont have such a semantic and
hence buggy dvd authoring programs on those platforms don't check for
it.

Would it be useful to have a file system option to specify something
along the lines "buggy_dvd" which automatically gives all directories a
0x111 bump?

I ran into this w/ gnome's gnome-volume-manager which automatically
mounted the dvd on insertion, but wasn't able to play it unless the dvd
player was running as root.

thanks,

shaya

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
(Continue reading)

Jamie Lokier | 2 Dec 2004 11:03

Re: badly authored udf file systems

Shaya Potter wrote:
> in looking at the ecma spec
> (http://www.ecma-international.org/publications/standards/Ecma-167.htm)
> and section 4/14.9 where the file entry record is discussed, it shows
> that the permission scheme sort of mirrors unix (i.e. owner, group,
> other and read/write/execute bits).  However, the spec is ambigious
> because when it refers to the execute bit, it doesn't talk about
> directories at all.  In normal unix, of course one needs the execute bit
> set, however, its probable other systems dont have such a semantic and
> hence buggy dvd authoring programs on those platforms don't check for
> it.
>
> Would it be useful to have a file system option to specify something
> along the lines "buggy_dvd" which automatically gives all directories a
> 0x111 bump?

If the spec doesn't talk about directory execute permissions at all,
and it looks like that is intended, then shouldn't the 0x111 bump be
done all the time for UDF?  More exactly, adding execute permissions
corresponding to whichever read permissions are set.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaya Potter | 2 Dec 2004 14:22

Re: badly authored udf file systems

On Thu, 2004-12-02 at 10:03 +0000, Jamie Lokier wrote:
> Shaya Potter wrote:
> > in looking at the ecma spec
> > (http://www.ecma-international.org/publications/standards/Ecma-167.htm)
> > and section 4/14.9 where the file entry record is discussed, it shows
> > that the permission scheme sort of mirrors unix (i.e. owner, group,
> > other and read/write/execute bits).  However, the spec is ambigious
> > because when it refers to the execute bit, it doesn't talk about
> > directories at all.  In normal unix, of course one needs the execute bit
> > set, however, its probable other systems dont have such a semantic and
> > hence buggy dvd authoring programs on those platforms don't check for
> > it.
> >
> > Would it be useful to have a file system option to specify something
> > along the lines "buggy_dvd" which automatically gives all directories a
> > 0x111 bump?
> 
> If the spec doesn't talk about directory execute permissions at all,
> and it looks like that is intended, then shouldn't the 0x111 bump be
> done all the time for UDF?  More exactly, adding execute permissions
> corresponding to whichever read permissions are set.

perhaps, I didn't read through the entire spec, so don't know if
directory permissions are covered elsewhere (though a search of the pdf
for "directory permission" didn't find anything), just the part talking
about permissions.

specifically
---
14.9.5 Permissions (BP 44)
(Continue reading)

Steve French | 3 Dec 2004 19:06
Picon

Kernel panic - not syncing: Attempting to free lock with active block list

Anyone know what the attempt of the kernel message is, what is it
supposed to mean - 

Kernel panic - not syncing: Attempting to free lock with active block
list

It started showing up running locktests over cifs only after some byte
range locking changes were made to the VFS (outside cifs) a few months
ago.

Daniel Phillips | 3 Dec 2004 23:07

Re: [PATCH] [Request for inclusion] Filesystem in Userspace

Hi Avi,

On Tuesday 30 November 2004 16:37, Avi Kivity wrote:
> The situation with userspace filesystems is:
>
>   some process allocates memory, blocking on kswapd as memory is full
>   kswapd calls userspace filesystem to free memory
>   userspace filesystem calls kernel, which allocates memory and blocks
> on kswapd
>   eventually all processes in the system block on kswapd
>
> I have observed (and fixed) this on a real system.

What was your fix?

Regards,

Daniel
Daniel Phillips | 3 Dec 2004 23:07

Re: [PATCH] [Request for inclusion] Filesystem in Userspace

Hi Rik,

On Saturday 27 November 2004 12:07, Rik van Riel wrote:
> On Fri, 19 Nov 2004, Miklos Szeredi wrote:
> > The solution I'm thinking is along the lines of accounting the number
> > of writable pages assigned to FUSE filesystems.  Limiting this should
> > solve the deadlock problem.  This would only impact performance for
> > shared writable mappings, which are rare anyway.
>
> Note that NFS, and any filesystems on iSCSI or g/e/ndb block
> devices have the exact same problem.  To explain why this is
> the case, lets start with the VM allocation and pageout
> thresholds:
>
>    pages_min ------------------
>
>   GFP_ATOMIC ------------------
>
> PF_MEMALLOC ------------------
>
>     0 ------------------
>
> When writing out a dirty page, the pageout code is allowed
> to allocate network buffers down to the PF_MEMALLOC boundary.
>
> However, when receiving the ACK network packets from the server,
> the network stack is only allowed to allocate memory down to the
> GFP_ATOMIC watermark.
>
> This means it is relatively easy to get the system to deadlock,
(Continue reading)


Gmane