Neil Brown | 2 Sep 2002 01:43
X-Face
Picon
Picon
Favicon

PATCH - change to blkdev->queue calling triggers BUG in md.c


Changeset 1.573 (just prior to 2.5.33 release) changed the calling
sequence for blk_dev[major].queue so that it is now called before the 
bd_op->open function is called.
This triggers a BUG in md.c which checked that the device was open
whenever ->queue was called.  Patch below removes the BUG.

I'm actually a little disappointed by this change.  I was hoping that
the ->queue might get changed to be passed a 'struct block_device *'
instead of a 'kdev_t' so that the device driver would only have to
interpret the device number in one place: the open.  But now that
->queue is called before ->open, that wouldn't help.

I don't suppose it would make sense to do the default:
	if (!bdev->bd_queue) {
		struct blk_dev_struct *p = blk_dev + major(dev);
		bdev->bd_queue = &p->request_queue;
	}
bit where it is now, and leave the:
		if (p->queue)
			bdev->bd_queue =  p->queue(dev);

bit until after the open?  It would keep floppy happy, and make me
happy too, but I'm not sure that it is actually 'right'...

Anyway, here is the patch that stops md from BUGging out.

NeilBrown

### Comments for ChangeSet
(Continue reading)

Linus Torvalds | 2 Sep 2002 06:13

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c


On Mon, 2 Sep 2002, Neil Brown wrote:
> 
> I'm actually a little disappointed by this change.  I was hoping that
> the ->queue might get changed to be passed a 'struct block_device *'
> instead of a 'kdev_t' so that the device driver would only have to
> interpret the device number in one place: the open.  But now that
> ->queue is called before ->open, that wouldn't help.

We may still do this. 

Right now the _only_ reason to call ->queue before open() is that open() 
is also doing things like disk change checking, which reasonably needs the 
queue because it can need to do IO in order to check the disk change 
status. The floppy in fact did exactly this.

HOWEVER, that disk change checking really should be done by the generic
layers, and it should be done after the open() anyway (and not by the
open), and I think Al is actually working on this. That will allow us to 
be a bit more flexible about the ordering.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andries.Brouwer | 2 Sep 2002 10:53
Picon
Picon
Favicon

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c

> HOWEVER, that disk change checking really should be done by
> the generic layers, and it should be done after the open() anyway
> (and not by the open)

Are you sure?
I am inclined to think that this would be an undesirable change of
open() semantics. Traditionally, and according to all standards,
open() will return ENXIO when the device does not exist.

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jakob Oestergaard | 2 Sep 2002 15:25

Re: Problems with raidreconf and RAID-5

On Thu, Aug 22, 2002 at 02:18:21PM +0300, Mikko Saukkoriipi wrote:
>   I have been testing raidreconf and I've been able to add a new disk to
> a RAID-0 array, but trying to convert RAID-0 to RAID-5 or add a new disk
> to RAID-5 have failed.

I see that your disks are not of equal size - I have seen one other
report of problems with the RAID-5 driver (in reaidreconf - not the
kernel RAID-5 driver) related to disks of different sizes.

>   Just before the conversion has finished or some times even at
> beginning I get and error "raid5_map_global_to_local: disk 0 block out
> of range: 488098 (487840) gblock = 1464296" or similar and raidreconf
> aborts.

Yep - very similar to that other report.

>   I'll add below script of the program running as well the used raidtabs
> and listings of partitiontables. The example is from adding a new disk
> to a 3 drive RAID-5 array. raidreconf version is "raidreconf 0.1.1 for
> mkraid version 0.90.0".

Great - thanks

> 
> Also, what file is gmon.out? It appears after running raidreconf, but
> the documentation didn't mention anything about it.

That's profiling output - because raidreconf (for some reason) was
compiled with the -pg switch for gcc.

(Continue reading)

Linus Torvalds | 2 Sep 2002 19:01

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c


On Mon, 2 Sep 2002 Andries.Brouwer <at> cwi.nl wrote:
> > HOWEVER, that disk change checking really should be done by
> > the generic layers, and it should be done after the open() anyway
> > (and not by the open)
> 
> Are you sure?
> I am inclined to think that this would be an undesirable change of
> open() semantics. Traditionally, and according to all standards,
> open() will return ENXIO when the device does not exist.

Well, one reason I don't want the low-level drivers doing the media change 
checking is that there's more to media change than just checking the 
media.

For example, the higher levels want to do a partition table re-read if the
media really has changed. We do have this strange "bd_invalidated" thing
for passing that information back, and maybe that is acceptable. It's a
bit subtle, though.

Another reason why it would be good to factor out media change from open() 
is that I can well imagine that somebody would want to do a "door open" 
ioctl on a device without a media, and we actually do kind of have that 
interface: opening with O_NDELAY historically means to not do the media 
change checks.

And guess what? Because that test is done inside the low-level driver
right now, it means that these O_NDELAY semantics aren't actually known or
followed by most drivers, _and_ it means that the higher levels don't even
realize that sometimes the media check hasn't gotten done at all (ie
(Continue reading)

Andries Brouwer | 2 Sep 2002 22:35
Picon
Picon
Favicon

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c

On Mon, Sep 02, 2002 at 10:01:46AM -0700, Linus Torvalds wrote:

> For example, the higher levels want to do a partition table re-read
> if the media really has changed.

My original setup made a kernel that does not know anything about
partition tables. User space would tell the kernel about partitions
on some block device.

Roughly speaking the impact is that there is a partx invocation
before a mount.

Now it seems Al is doing all the work, so I can just sit back and watch.
But I hope he makes precisely this: a kernel that does not do any
partition reading of its own.

Andries

[Yes, it is fundamentally wrong when the kernel starts guessing.
Guessing filesystem type is bad. Also guessing partition table type
is bad. Moreover, the kernel probing may lead to device problems
and even to kernel crashes, as I last observed two days ago.
Only the user knows what she wants to do with this disk. Format?
Remove OnTrack Disk Manager? There are all kinds of situations
where partition table re-read is directly harmful.]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Linus Torvalds | 2 Sep 2002 22:50

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c


On Mon, 2 Sep 2002, Andries Brouwer wrote:
> 
> Now it seems Al is doing all the work, so I can just sit back and watch.
> But I hope he makes precisely this: a kernel that does not do any
> partition reading of its own.

I disagree, if only because of backwards competibility issues.

On a conceptual level I think you're right. However, it will break too 
many standard installations as is.

If/when we have a reasonable initrd setup that is usable, we could do some 
automatic partitioning of devices that are available at bootup to minimize 
the impact, but I don't think it is realistic otherwise.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andries.Brouwer | 2 Sep 2002 23:27
Picon
Picon
Favicon

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c

    > But I hope he makes precisely this: a kernel that does not do any
    > partition reading of its own.

    I disagree, if only because of backwards compatibility issues.

    On a conceptual level I think you're right. However, it will break too 
    many standard installations as is.

    If/when we have a reasonable initrd setup that is usable, we could do some 
    automatic partitioning of devices that are available at bootup to minimize 
    the impact, but I don't think it is realistic otherwise.

Compare it with mounting.

It would be very bad if the kernel automatically mounted all
filesystems in sight. So, user space tells what to mount.
But at boot time there is a special situation.
In the end we want to have an initrd that mounts the rootfs,
but today we give kernel command line parameters with
rootfstype= and root=.

In a similar way it is bad that the kernel automatically tries
to interpret some data on a block device as a partition table.
The user can tell the kernel. (Yes, today.)
But at boot time there is a special situation.
In the end we want to have an initrd that does the partition reading,
but now we could give a kernel command line parameter with
rootpttype= and have the kernel only parse the partition table
of the root device.

(Continue reading)

Linus Torvalds | 2 Sep 2002 23:39

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c


On Mon, 2 Sep 2002 Andries.Brouwer <at> cwi.nl wrote:
> 
> Compare it with mounting.

NO.

The point about backwards compatibility is that things WORK.

There's no point in comparing things to how you _want_ them to work. The
only thing that matters for bckwards compatibility is how they work
_today_.

And your suggestion would break every single installation out there. Not 
"maybe a few".  Every single one.

(yeah, you could find some NFS-only setup that doesn't break. Big deal).

And backwards compatibility is extremely important. 

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andries.Brouwer | 2 Sep 2002 23:41
Picon
Picon
Favicon

Re: PATCH - change to blkdev->queue calling triggers BUG in md.c

> The point about backwards compatibility is that things WORK.

Must I conclude that you did not read my entire letter?

Since we started this small detour talking about media change,
let me quote that fragment once more.

"[Don't think that I actually propose doing this today as the default,
but it would be a very small patch to add this as an optional
behaviour. But there is today, and there is the faraway goal.
The faraway goal is: no partition table reading in the kernel.
And that influences designing today what to do on media change.
Already today I would consider it entirely reasonable if there
was no automatic partition table reading after a media change.]"

No, my suggested changes would not break a single Linux installation
in the world.

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane