YAMAMOTO Takashi | 3 Oct 06:54 2011
Picon

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

hi,

> Hi folks,
> 
> after getting stuck in the 1st implementation in the rump/puffs/refuse jungle
> i started a new version that is more in line with the Solaris implementation
> and is far less invasive.
> 
> Basicly the system call forwards the requests using ioctl's just like Solaris
> and, as it turns out, also FreeBSD with their ZFS import. For simplicity and
> to reduce compat stuff i've used the same ioctls FreeBSD defines. FreeBSDs
> support is limited though; only ZFS handles them. The ioctl names are not
> documented yet.
> 
> The new implementation presents the default one-blob for file systems that
> don't implement it. For NetBSD its currently implemented for UFS and is tested
> for FFS with/without WAPBL, ext2fs and lfs. It is present in our ZFS import
> but aparently disabled still and i dont have a ZFS partition to play with. I
> might be tempted to try it later on my scratch machine :) UDF is next but
> shouldn't be that difficult.

why is the VOP_FSYNC call necessary?

YAMAMOTO Takashi

> 
> Rests the userland tool awareness and utilizing but thats phase 2.
> 
> With regards,
> Reinoud
(Continue reading)

Alan Barrett | 3 Oct 08:33 2011

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

On Wed, 17 Aug 2011, Reinoud Zandijk wrote:
> after getting stuck in the 1st implementation in the 
> rump/puffs/refuse jungle i started a new version that is more in 
> line with the Solaris implementation and is far less invasive.
>
> Basicly the system call forwards the requests using ioctl's just 
> like Solaris and, as it turns out, also FreeBSD with their ZFS 
> import. For simplicity and to reduce compat stuff i've used the 
> same ioctls FreeBSD defines. FreeBSDs support is limited though; 
> only ZFS handles them. The ioctl names are not documented yet.

So, if I am reverse engineering the code correctly, the design is 
like this:

   There are no new VOP calls.

   There are two new ioctls, FIOSEEKDATA and FIOSEEKHOLE.  Each
   file system may provide its own implementation.  If the
   underlying file system doesn't support them, then they fail.

   There are two new lseek 'whence' flags, SEEK_DATA and SEEK_HOLE.
   The kernel's lseek implementation forwards them to the
   underlying file system using VOP_IOCTL(FIOSEEKDATA) and
   VOP_IOCTL(FIOSEEKHOLE).  If the ioctl fails, then lseek
   implements the fallback behaviour of treating the file as a
   single data region followed by a hole after the end of file.

I think that it would be better to implement the fallback behaviour in
the vfs layer rather than in the lseek syscall.

(Continue reading)

Reinoud Zandijk | 3 Oct 09:27 2011
Picon

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

Hi!

On Mon, Oct 03, 2011 at 04:54:29AM +0000, YAMAMOTO Takashi wrote:
> > The new implementation presents the default one-blob for file systems that
> > don't implement it. For NetBSD its currently implemented for UFS and is
> > tested for FFS with/without WAPBL, ext2fs and lfs. It is present in our
> > ZFS import but aparently disabled still and i dont have a ZFS partition to
> > play with. I might be tempted to try it later on my scratch machine :) UDF
> > is next but shouldn't be that difficult.
> 
> why is the VOP_FSYNC call necessary?

The sparse region search code depends on the indirect blocks being correctly
written out as it traverses them. If the file is still `dirty' all the
indirect blocks are present as negative indices so the normal FFS code works
but their indirect blocks, when addressed with their disc addresses, are not
up-to-date.

The FFS sparse region search code depends on the indirect blocks to see where
actual data is recorded and needs the indirect blocks to be up-to-date. A
range sync with only the negative range might also suffice but since most if
not all of the applications of this code is dealing with backup/processing the
VOP_FSYNC() is normally a NOP.

I hope this explanation helps :)

With regards,
Reinoud

(Continue reading)

Erik Fair | 3 Oct 09:40 2011
Picon

UNIX kernel notification system

Ah, Matt, now you've stepped in it: UNIX kernel notifications, and a model for that. A topic that I glossed
over in my previous note.

There are three basic modes of UNIX use:

1. traditional multi-user timesharing system. We in NetBSD land still use our systems this way sometimes;
cf. ftp.netbsd.org

2. single-user workstation (this includes laptops) and that user works on the "console" (or "graphics
device" + "input devices").

3. lights-out server in some dark closet or data center. Console? Uh, yeah, it's over there, on top of the KVM
or serial switch; keyboard's in the drawer. Be sure to blow the dust off it before you use it.

We ought to try and come up with a notification abstraction model that works reasonably well for each use
case, and preferably one which permits automated userland software response to various common events.

So, historically, we have the venerable serial console (or some simulation thereof) onto which the kernel
can printf, with the expectation (well, assumption) that an "operator" is watching (reading), and will
respond; born of that time when UNIX systems were too heavy for a single human to lift, kept in well
air-conditioned glass rooms, with noisy LA-120 DECwriter II consoles ... or if we go way back, *really*
noisy TTY model ASR-33's. With real paper made from trees!

A little later, we faked that with xterm -C or xconsole. Or the kernel just blatted all over your frame buffer
in large font, and you had to xrefresh(1), and then dmesg(8) in an xterm to see what it had said.

Or syslog(3), on the presumption that there's a userland daemon "watching" (well, logging the events) ...
and ... someone maybe, eventually looks at it? Maybe that feeds into an SNMP network management console
... somewhere?

(Continue reading)

Mouse | 3 Oct 09:56 2011

Re: UNIX kernel notification system

[Do you really mean to use paragraph-length lines?  I'd suggest against
it; they impair readability significantly, at least for me.  Manually
rewrapped in the quotes below.]

> There are three basic modes of UNIX use:

> 1. traditional multi-user timesharing system.  We in NetBSD land
> still use our systems this way sometimes; cf. ftp.netbsd.org

> 2. single-user workstation (this includes laptops) and that user
> works on the "console" (or "graphics device" + "input devices").

> 3. lights-out server in some dark closet or data center.  Console?
> Uh, yeah, it's over there, on top of the KVM or serial switch;
> keyboard's in the drawer.  Be sure to blow the dust off it [...]

What about embedded?  Anything from a "smartphone" to a beagleboard to
an Arduino.  Sometimes there's something that smacks a bit of a console
(eg, the smartphone); sometimes there isn't really, or if there is it's
connected up for, at best, debugging.  This partakes of each of the
above in some respects, depending on the details.

What about machines with multiple keyboard/screen heads, with
potentially a different user using each one?  (It's not common, but
it's certainly not impossible or nonexistent or etc.)  Again, some
aspects of each of the above.

> We ought to try and come up with a notification abstraction model
> that works reasonably well for each use case, and preferably one
> which permits automated userland software response to various common
(Continue reading)

Emmanuel Dreyfus | 3 Oct 10:10 2011
X-Face
Picon

Re: UNIX kernel notification system

On Mon, Oct 03, 2011 at 03:56:07AM -0400, Mouse wrote:
> Hmm, socket(AF_KMSGS,SOCK_STREAM,0)?  Not that that's the abstraction,
> just one possible way of implementing it.

I suspect you'll prefer SOCK_SEQPACKET for such a thing.

--

-- 
Emmanuel Dreyfus
manu <at> netbsd.org

Reinoud Zandijk | 3 Oct 12:04 2011
Picon

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

On Mon, Oct 03, 2011 at 08:33:06AM +0200, Alan Barrett wrote:
> So, if I am reverse engineering the code correctly, the design is 
> like this:
> 
>   There are no new VOP calls.
correct

>   There are two new ioctls, FIOSEEKDATA and FIOSEEKHOLE.  Each
>   file system may provide its own implementation.  If the
>   underlying file system doesn't support them, then they fail.

correct

>   There are two new lseek 'whence' flags, SEEK_DATA and SEEK_HOLE.
>   The kernel's lseek implementation forwards them to the
>   underlying file system using VOP_IOCTL(FIOSEEKDATA) and
>   VOP_IOCTL(FIOSEEKHOLE).  If the ioctl fails, then lseek
>   implements the fallback behaviour of treating the file as a
>   single data region followed by a hole after the end of file.

The vfs lseek system call does that yes

> I think that it would be better to implement the fallback behaviour in
> the vfs layer rather than in the lseek syscall.

I tried that before and it was in my origional patch. I changed the VOP_SEEK()
to accept the other two `whence' argument values. This however meant that
VOP_SEEK()'s prototype had to be extended resulting in severe compatibility
issues with puffs/rump/(re)fuse etc. resulting in a HUGE patchset.  Also,
external maintained code like ZFS had to be changed.
(Continue reading)

Alan Barrett | 3 Oct 17:42 2011

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

On Mon, 03 Oct 2011, Reinoud Zandijk wrote:
> On Mon, Oct 03, 2011 at 08:33:06AM +0200, Alan Barrett wrote:
>> I think that it would be better to implement the fallback 
>> behaviour in the vfs layer rather than in the lseek syscall.
>
> I tried that before and it was in my origional patch. I changed 
> the VOP_SEEK() to accept the other two `whence' argument values. 
> VOP_SEEK()'s prototype had to be extended resulting in severe 
> compatibility issues with puffs/rump/(re)fuse etc. resulting in 
> a HUGE patchset.  Also, external maintained code like ZFS had to 
> be changed.

Your original patch did that in VOP_SEEK, yes.  I think that was a 
bad idea, and that's not what I am suggesting.

When I suggested "implement the fallback behaviour in the vfs 
layer", I meant in the vfs layer's handling of the new FIOSEEKHOLE 
and FIOSEEKDATA ioctls.  This would mean that users of the new 
lseek flags, and users of the new ioctls, would both get the 
fallback behaviour that, if the underlying file system doesn't 
know better, a file appears to have a single data region followed 
by a hole after EOF.

> Does this answer your question?

Not really, but I see that my suggesiton was unclear.  I hope 
it's more clear now.

--apb (Alan Barrett)

(Continue reading)

David Holland | 3 Oct 20:14 2011
Picon

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

On Mon, Oct 03, 2011 at 12:04:36PM +0200, Reinoud Zandijk wrote:
 > > So, if I am reverse engineering the code correctly, the design is 
 > > like this:
 > > 
 > >   There are no new VOP calls.
 > correct

As I've said already a couple times, this should be done by new VOP
calls at the filesystem level. If the ioctl needs to exist, it should
be received at the VFS level.

My preference would be to add the lseek behavior strictly in
userland... assuming we really need to implement that API and can't
provide a sane user API instead.

You said the other day in chat that FreeBSD had implemented this; can
you post an explanation of their ioctl API?

--

-- 
David A. Holland
dholland <at> netbsd.org

David Holland | 3 Oct 20:18 2011
Picon

Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

On Mon, Oct 03, 2011 at 09:27:35AM +0200, Reinoud Zandijk wrote:
 > > why is the VOP_FSYNC call necessary?
 > 
 > The sparse region search code depends on the indirect blocks being
 > correctly written out as it traverses them. If the file is still
 > `dirty' all the indirect blocks are present as negative indices so
 > the normal FFS code works but their indirect blocks, when addressed
 > with their disc addresses, are not up-to-date.

...so read them out of the cache.

 > The FFS sparse region search code depends on the indirect blocks to
 > see where actual data is recorded and needs the indirect blocks to
 > be up-to-date. A range sync with only the negative range might also
 > suffice but since most if not all of the applications of this code
 > is dealing with backup/processing the VOP_FSYNC() is normally a
 > NOP.

This shouldn't need to be there for what is purely a read operation on
metadata.

--

-- 
David A. Holland
dholland <at> netbsd.org


Gmane