Pawel Jakub Dawidek | 4 Jun 2004 10:26
Picon
Favicon

spoiled but acr = 1.

Hi.

While working on geom_mirror and using geom_nop I've found that this
is possible to panic geom in this way:

We have providers: md0 and md0.nop (attached to md0, but not opened).

We load new class (geom_mirror).

GEOM gives provider md0 for taste.

Class is tasting md0 and opening it with r1w1e1.

Provider was opened for writing so all md0's consumers are marked as
beeing spoiled (md0.nop -> md0 consumer too) and g_spoil_event is send.

Before g_spoil_event can be proceeded we're still in g_new_provider
event and the next provider to taste is md0.nop.

Class is trying to open md0.nop for reading.

Provider md0.nop is trying to open through its consumer (which was
marked as beeing spoiled) provider md0 and... panic.

I'm not sure how to solve this yet. Maybe before we give provider
for taste we should check if any of its consumer is marked as beeing
spoiled. But when we should return with taste of this provider?

Hmm, or maybe something like this:

(Continue reading)

Pawel Jakub Dawidek | 4 Jun 2004 12:34
Picon
Favicon

Re: spoiled but acr = 1.

On Fri, Jun 04, 2004 at 10:26:59AM +0200, Pawel Jakub Dawidek wrote:
+> Hi.
+> 
+> While working on geom_mirror and using geom_nop I've found that this
+> is possible to panic geom in this way:
+> 
+> We have providers: md0 and md0.nop (attached to md0, but not opened).
+> 
+> We load new class (geom_mirror).
+> 
+> GEOM gives provider md0 for taste.
+> 
+> Class is tasting md0 and opening it with r1w1e1.
+> 
+> Provider was opened for writing so all md0's consumers are marked as
+> beeing spoiled (md0.nop -> md0 consumer too) and g_spoil_event is send.
+> 
+> Before g_spoil_event can be proceeded we're still in g_new_provider
+> event and the next provider to taste is md0.nop.
+> 
+> Class is trying to open md0.nop for reading.
+> 
+> Provider md0.nop is trying to open through its consumer (which was
+> marked as beeing spoiled) provider md0 and... panic.
+> 
+> 
+> I'm not sure how to solve this yet. Maybe before we give provider
+> for taste we should check if any of its consumer is marked as beeing
+> spoiled. But when we should return with taste of this provider?
+> 
(Continue reading)

Lukas Ertl | 10 Jun 2004 22:05
Picon
Favicon

Correct GEOM bio handling

Hi there,

I've run into a problem with how to correctly handle struct bios in GEOM. 
I have the following scenario in vinum:

A plex needs to be synced because its data is out-of-date.  The solution I 
was thinking of is to create a kthread which reads the data from a 'good' 
plex and writes it out to the 'bad' plex.  Now, it would be ideal if 
'normal' requests (which are not part of this rebuild process) are already 
accepted while the rebuild process is still on-going.  Of course, this 
could be a problem if the new data is later overwritten by the rebuild 
process.  So I was thinking of cloning the incoming bio, check if the 
adjusted offsets are beyond where the rebuild process currently is, and if 
they are put the clone on a 'waitlist' where it will be picked up by the 
rebuilding kthread once the rebuild pointer is far enough and 
then scheduled down.

The rebuilding itself works fine, the requests on the waitqueue are 
detected, but they seem to be ignored once I g_io_request() them, and the 
process that initiated them is stuck.

So, I'm thinking that I'm missing some important detail in this bio 
handling, and I could use some input from you guys.

Thank you,
le

--

-- 
Lukas Ertl                         http://homepage.univie.ac.at/l.ertl/
le <at> FreeBSD.org                     http://people.freebsd.org/~le/
(Continue reading)

Poul-Henning Kamp | 10 Jun 2004 22:13
Picon
Favicon

Re: Correct GEOM bio handling

In message <20040610214726.G23746 <at> leelou.in.tern>, Lukas Ertl writes:

>A plex needs to be synced because its data is out-of-date.  The solution I 
>was thinking of is to create a kthread which reads the data from a 'good' 
>plex and writes it out to the 'bad' plex.  Now, it would be ideal if 
>'normal' requests (which are not part of this rebuild process) are already 
>accepted while the rebuild process is still on-going.

Normally what you do is you block the bad plex for reading but not
for writing.  That means all normal writes go also to the bad plex,
no matter where on the bad plex they are located.

Your rebuilder will then read from the good and write to the bad in
a sequential fashion, and when it is done, the bad plex is good too
and can be releases for reading.

Some implementations use compressed bitmaps, so that they know that
bits where a normal write happened can be skipped by the rebuilder.

Some even use "parasitic rebuild" where normal reads are written
to the bad plex as well (if not already up to date) in order to
save on the I/O operations.

>The rebuilding itself works fine, the requests on the waitqueue are 
>detected, but they seem to be ignored once I g_io_request() them, and the 
>process that initiated them is stuck.

Can you find where they are ?  Are they on the I/O list ?

What if you set debugflags=4, can you see where they went ?
(Continue reading)

Lukas Ertl | 11 Jun 2004 00:07
Picon
Favicon

Re: Correct GEOM bio handling

On Thu, 10 Jun 2004, Poul-Henning Kamp wrote:

> Normally what you do is you block the bad plex for reading but not
> for writing.  That means all normal writes go also to the bad plex,
> no matter where on the bad plex they are located.

Ah, ok, that makes sense.

>> The rebuilding itself works fine, the requests on the waitqueue are
>> detected, but they seem to be ignored once I g_io_request() them, and the
>> process that initiated them is stuck.
>
> Can you find where they are ?  Are they on the I/O list ?

They are neither on the bio_down queue nor on the bio up queue.  They 
vanished. :-)

> What if you set debugflags=4, can you see where they went ?

I assume you meant debugflags=2, since that's the bio debuglevel, but I 
don't see them there, too.

If I biowait() on them then I can see that BIO_DONE isn't set, so it 
will not return.

Anyway, if I let writes go through unconditionally and reject all reads, 
then I probably don't have this problem at all.

thanks,
le
(Continue reading)

Pawel Jakub Dawidek | 11 Jun 2004 09:34
Picon
Favicon

Re: Correct GEOM bio handling

On Thu, Jun 10, 2004 at 10:13:08PM +0200, Poul-Henning Kamp wrote:
+> Some implementations use compressed bitmaps, so that they know that
+> bits where a normal write happened can be skipped by the rebuilder.

I'm using bitmaps to do sychronization in geom_mirror.
It works quite ok.

+> Some even use "parasitic rebuild" where normal reads are written
+> to the bad plex as well (if not already up to date) in order to
+> save on the I/O operations.

Nice idea, I need to put it into geom_mirror:)

--

-- 
Pawel Jakub Dawidek                       http://www.FreeBSD.org
pjd <at> FreeBSD.org                           http://garage.freebsd.pl
FreeBSD committer                         Am I Evil? Yes, I Am!
Poul-Henning Kamp | 13 Jun 2004 19:14
Picon
Favicon

GEOM class idea...


OK, here is one of the more nasty ideas for a GEOM class:

Many of us read CD's into iso images, stick them on a harddisk and
mount them from there when we need to access them.  This usually
costs us a md(4) vnode gadget, and that is really a waste.

Write a GEOM class which is a slicer, but it needs to work on CD9660
image headers as metadata, and work the following way:

Read in the first ISO image onto our archive disk:

	dd if=/dev/acd0 of=/dev/ad8 bs=2k

On close, the disk is tasted, and our "geom_cdarch" class finds a
valid CD9660 volume description and attaches to the disk.

It extracts the image size from the CD9660 descriptor (offset 0x8050,
32bitLE.  Repeated at 0x8054 as 32bitBE) and creates a slice with
this ISO image in it.

Since there is no valid CD9660 descriptor on the disk right after
this image, the remaining free space gets put into a special
slice ("ad8.freespace").

To read in the next ISO image:

	dd if=/dev/acd0 of=/dev/ad8.freespace bs=2k

On close the geom_cdarch class looks for a valid CD9660 volume and
(Continue reading)

Soeren Straarup | 14 Jun 2004 16:05
Picon

Re: GEOM class idea...

On Sun, 13 Jun 2004, Poul-Henning Kamp wrote:

>
> OK, here is one of the more nasty ideas for a GEOM class:
>
> Many of us read CD's into iso images, stick them on a harddisk and
> mount them from there when we need to access them.  This usually
> costs us a md(4) vnode gadget, and that is really a waste.
>
> Write a GEOM class which is a slicer, but it needs to work on CD9660
> image headers as metadata, and work the following way:
>
> Read in the first ISO image onto our archive disk:
>
> 	dd if=/dev/acd0 of=/dev/ad8 bs=2k
>
> On close, the disk is tasted, and our "geom_cdarch" class finds a
> valid CD9660 volume description and attaches to the disk.
>
> It extracts the image size from the CD9660 descriptor (offset 0x8050,
> 32bitLE.  Repeated at 0x8054 as 32bitBE) and creates a slice with
> this ISO image in it.
>
> Since there is no valid CD9660 descriptor on the disk right after
> this image, the remaining free space gets put into a special
> slice ("ad8.freespace").
>
> To read in the next ISO image:
>
> 	dd if=/dev/acd0 of=/dev/ad8.freespace bs=2k
(Continue reading)

Poul-Henning Kamp | 14 Jun 2004 16:20
Picon
Favicon

Re: GEOM class idea...

In message <20040614160301.Y81182-100000 <at> x12.dk>, Soeren Straarup writes:

>Question: Will the iso 'fs' be bootable and listed by the boot manager?
>That is if the iso is a bootable iso image

No.  This is purely meant for "CD-server" use.

--

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk <at> FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
Gordon Tetlow | 15 Jun 2004 00:09
Picon
Favicon

Re: GEOM class idea...

On Sun, Jun 13, 2004 at 07:14:11PM +0200, Poul-Henning Kamp wrote:
> 
> OK, here is one of the more nasty ideas for a GEOM class:
> 
> Many of us read CD's into iso images, stick them on a harddisk and
> mount them from there when we need to access them.  This usually
> costs us a md(4) vnode gadget, and that is really a waste.

... description of grotty geom class ...

I can't imagine that all the pain that you are talking about is a
worthwhile effort when it's so easy to do a md backed file. I can
just about guarentee that users will have spare files and capacity
before they have a spare disk running around.

Maybe I'm missing something here, but what is the advantage of
going straight off of the disk? Are you trying to avoid the FFS
filesystem overhead?

-gordon

Gmane