Alan Cox | 1 Apr 2007 02:14
Picon

Re: broken device locking, sg vs. sg_io on block devices

> > > The use of /dev/sg* is still common practice, its invention predates
> > 
> > The /dev/sg interface cannot do the locking. If you use /dev/sg you are
> 
> Again, it doesn't have to. It can pass the locking operations to the
> related block device driver.

No it can't. The driver has no idea what the locking rules are for
arbitary command blocks send to arbitary devices. /dev/sg is a *raw*
interface. You can send anything to anyone, and the locking rules for
that are far too complex for a giant morass of kernel code to get added.

The mess begins because you use /dev/sg and put it in a cdrom group
instead of using SG_IO on the /dev/sr device. The mess continues because
of the user of O_EXCL locking thus forcing re-open/close by HAL instead
of fcntl based co-operative locking.

The job of the kernel is not and never has been to anticipate and correct
everything stupid someone tries to do in user space. 

As I said before the people wanting to arbitrate serial ports got this
right in the mid 1970's your situation is not much more complicated,
unless you persist in using /dev/sg - which yes does make it hard, but so
does writing it in COBOL, or while standing on your head. And the
solution to all three cases is the same *DONT DO IT*

Alan
Rafael J. Wysocki | 1 Apr 2007 01:26
Picon
Gravatar

Re: [PATCH] Fix microcode-related suspend problem

On Saturday, 31 March 2007 23:23, Adrian Bunk wrote:
> On Sat, Mar 31, 2007 at 01:35:32PM -0700, Andrew Morton wrote:
> > On Sat, 31 Mar 2007 22:04:15 +0200 "Rafael J. Wysocki" <rjw <at> sisk.pl> wrote:
> > 
> > > This patch appeard on LMKL six days ago and there have not been any negative
> > > comments since then, so I think I can try to make it official.
> > > 
> > > ---
> > > From: Rafael J. Wysocki <rjw <at> sisk.pl>
> > > 
> > > Fix the regression resulting from the recent change of suspend code ordering
> > > that causes systems based on Intel x86 CPUs using the microcode driver to
> > > hang during the resume.
> > > 
> > > The problem occurs since the microcode driver uses request_firmware() in its
> > > CPU hotplug notifier, which is called after tasks has been frozen and hangs.
> > > It can be fixed by telling the microcode driver to use the microcode stored in
> > > memory during the resume instead of trying to load it from disk.
> > 
> > CONFIG_SMP=n:
> > 
> > arch/i386/kernel/microcode.c: In function 'microcode_init_cpu':
> > arch/i386/kernel/microcode.c:628: error: 'suspend_cpu_hotplug' undeclared (first use in this function)
> > arch/i386/kernel/microcode.c:628: error: (Each undeclared identifier is reported only once
> > arch/i386/kernel/microcode.c:628: error: for each function it appears in.)
> > arch/i386/kernel/microcode.c: In function 'mc_sysdev_add':
> > arch/i386/kernel/microcode.c:717: error: 'suspend_cpu_hotplug' undeclared (first use in this function)
> > arch/i386/kernel/microcode.c: In function 'mc_sysdev_remove':
> > arch/i386/kernel/microcode.c:745: error: 'suspend_cpu_hotplug' undeclared (first use in this function)
> > 
(Continue reading)

Parag Warudkar | 1 Apr 2007 02:01
Picon

Re: [RFC] UML kernel & rootfs bundle with every kernel release ?

Hi 

<devzero <at> web.de> writes:

> Whenever you want to test some new kernel (feature), you may put you main
system at risk, exactly know what
> you`re doing - or - use UserModeLinux.

Why won't qemu work better in this case? I generally keep a debian testing
installation on disk and when I compile a new kernel I just point qemu to load
it with the debian root fs. It's fast enough (even the kernel mode accelerator
module is GPLed now) and you don't need to mess around with your main system's
kernel. You can even test different arches - like both x86_64 and i386 on one
x64 box for example.

It doesn't work for any particular driver related testing but UML won't either.
And it won't benefit UML development but I don't know if that was your main
objective as opposed to general kernel experimentation and testing.

Parag

Florian D. | 1 Apr 2007 01:44
Picon
Picon

cannot add device to partitioned raid6 array

hi list!

in short:
I created a partitioned raid6 array with 2 missing drives. Now, I want to add a device. It fails with:
flockmock ~ # mdadm -a /dev/md_d4 /dev/sdb2
mdadm: add new device failed for /dev/sdb2 as 4: Invalid argument

I think it is the same problem as in:
http://marc.info/?l=linux-raid&m=115316147716600&w=2

details:
kernel 2.6.20.4
mdadm-2.6.1
the raid6 array:

flockmock ~ # mdadm --detail /dev/md_d4
/dev/md_d4:
         Version : 00.90.03
   Creation Time : Sat Mar 31 19:48:58 2007
      Raid Level : raid6
      Array Size : 490030464 (467.33 GiB 501.79 GB)
   Used Dev Size : 245015232 (233.66 GiB 250.90 GB)
    Raid Devices : 4
   Total Devices : 2
Preferred Minor : 4
     Persistence : Superblock is persistent

     Update Time : Sat Mar 31 23:28:37 2007
           State : clean, degraded
  Active Devices : 2
(Continue reading)

Stephen Clark | 1 Apr 2007 02:27
Picon

Re: 2.6.21-rc5 from fc7-rc2 problems

Andrew Morton wrote:

>On Sat, 31 Mar 2007 16:14:01 -0400 Stephen Clark <Stephen.Clark <at> seclark.us> wrote:
>
>  
>
>>Hello,
>>
>>I have just tried booting the fc7-rc2 live cd on 2 of my laptops and it 
>>failed on both.
>>
>>Laptop 1 is an asus vbi s96f that get a panic that says exception in 
>>interrupt routine
>>for my rtl8139.
>>
>>This device works fine in 2.6.20.2
>>    
>>
>
>That's regression #1.  Are you able to take a photograph of the screen
>when it has crashed?  Setting the display to 50 rows would make that more
>useful.  (serial console would be better, but I assume that thing has
>no serial port).
>
>  
>
>>The other laptop is a hp n5430 it fail in the ali-pata driver not being 
>>able to read the cdrom, timeing out
>>and dropping into a bash shell telling me to tell it where the root 
>>filesystem is.
(Continue reading)

James Simmons | 1 Apr 2007 02:56
Favicon

Re: [PATCH] fbdev sysfs imrovements


I can't seem to duplicate this error. Do you have any patches applied to 
the nvidia driver?

On Mon, 26 Mar 2007, Andrew Morton wrote:

> On Tue, 20 Mar 2007 14:25:49 +0000 (GMT) James Simmons <jsimmons <at> infradead.org> wrote:
> 
> > This patch does several things to allow the underlying hardware to be 
> > shared amount many devices. The most important thing is the use of
> > the created device via device_create instead of the hardware device. No 
> > longer should fbdev drivers use the xxx_set_drvdata with the parent
> > bus device. The second change is having a bus independent power management
> > for the framebuffer driver. The final change is using the release method 
> > to cleanup the device. The reason again is to make the fbdev driver 
> > independent of the bus parent device. Feedback is welcomed.
> 
> Causes an early crash on the powermac G5.
> 
> http://userweb.kernel.org/~akpm/s5000489.jpg (the oops is the usual powerpc
> mess)
> 
> config at http://userweb.kernel.org/~akpm/config-g5.txt
> 
Andrew Morton | 1 Apr 2007 03:24

Re: [PATCH] fbdev sysfs imrovements

> On Sun, 1 Apr 2007 01:56:28 +0100 (BST) James Simmons <jsimmons <at> infradead.org> wrote:
> 
> I can't seem to duplicate this error.

OK, I'll poke at it some more.

>  Do you have any patches applied to 
> the nvidia driver?

I don't think so - whatever's in -mm.

> On Mon, 26 Mar 2007, Andrew Morton wrote:
> 
> > On Tue, 20 Mar 2007 14:25:49 +0000 (GMT) James Simmons <jsimmons <at> infradead.org> wrote:
> > 
> > > This patch does several things to allow the underlying hardware to be 
> > > shared amount many devices. The most important thing is the use of
> > > the created device via device_create instead of the hardware device. No 
> > > longer should fbdev drivers use the xxx_set_drvdata with the parent
> > > bus device. The second change is having a bus independent power management
> > > for the framebuffer driver. The final change is using the release method 
> > > to cleanup the device. The reason again is to make the fbdev driver 
> > > independent of the bus parent device. Feedback is welcomed.
> > 
> > Causes an early crash on the powermac G5.
> > 
> > http://userweb.kernel.org/~akpm/s5000489.jpg (the oops is the usual powerpc
> > mess)
> > 
> > config at http://userweb.kernel.org/~akpm/config-g5.txt
(Continue reading)

Aaron Lehmann | 1 Apr 2007 03:27

Silent corruption on AMD64

Hello,

I discovered a reproducible way of causing silent file corruption.
Unfortunately, this method happens to me my backup procedure :(.

Background: I have five hard drives. sda and sdb are on a SiI 3112
card. sdc and sdd use onboard sata_via. hda uses onboard VIA VT8237
IDE. All filesystems are ext3. Ethernet is PCI RTL8169. My kernel is
2.6.20.1, configured for SMP and PREEMPT, but I was able to confirm
that this corruption happens without SMP or PREEMPT (though it's
rarer).

The following simultaneous actions result in corrupt data being read
from one of the sata_sil drives:

1. rsync files from sdd to sdc
2. rsync files from sdb to a remote host

If I run md5sum on a few hundred megabytes on sdb while doing these
things, the md5sum computed will usually be wrong. I believe the data
getting rsynced off sdb is also corrupt.

I have spent a lot of time trying to find a simpler test case. So far,
as far as I can tell, there are three conditions that must be
satisfied for corruption to occur:

1. Heavy Ethernet load (nc remotehost < /dev/zero)
2. Heavy disk write load on any non-sata_sil drive (cat /dev/zero > /path)
3. Heavy disk read load on any other drive (tar c /path | cat > /dev/null)

(Continue reading)

Oleg Verych | 1 Apr 2007 04:34
Picon
Picon

Re: broken device locking, sg vs. sg_io on block devices

> From: Alan Cox
> Newsgroups: gmane.linux.kernel
> Subject: Re: broken device locking, sg vs. sg_io on block devices
> Date: Sun, 1 Apr 2007 01:14:52 +0100
>
[]
>> Again, it doesn't have to. It can pass the locking operations to the
>> related block device driver.
>
> No it can't. The driver has no idea what the locking rules are for
> arbitary command blocks send to arbitary devices. /dev/sg is a *raw*
> interface. You can send anything to anyone, and the locking rules for
> that are far too complex for a giant morass of kernel code to get added.
>
> The mess begins because you use /dev/sg and put it in a cdrom group
> instead of using SG_IO on the /dev/sr device.

(offtop: 'cdrom' is as ugly as 'floppy' for anything like usb,
firewire connected storage, why not use 'optics' and 'external' or
something?)

> The mess continues because of the user of O_EXCL locking thus forcing
> re-open/close by HAL

Manpage states something bad about it also...

> instead of fcntl based co-operative locking.

> > > getty/modem/uucp/terminal emulator/slip/ppp/..

(Continue reading)

Andrew Morton | 1 Apr 2007 04:52

Re: Silent corruption on AMD64

> On Sat, 31 Mar 2007 18:27:36 -0700 Aaron Lehmann <aaronl <at> vitelus.com> wrote:
> I have spent a lot of time trying to find a simpler test case. So far,
> as far as I can tell, there are three conditions that must be
> satisfied for corruption to occur:
> 
> 1. Heavy Ethernet load (nc remotehost < /dev/zero)
> 2. Heavy disk write load on any non-sata_sil drive (cat /dev/zero > /path)
> 3. Heavy disk read load on any other drive (tar c /path | cat > /dev/null)
> 
> With these conditions satisfied, data read off sda or sdb (the drives
> associated with sata_sil) is often corrupted. Since I can only see
> this problem with files on those two drives, I'm inclined to suspect
> the sata_sil driver, but I really have no idea what's going on. I know
> this is not a recent issue - I experienced very similar corruption at
> least a year ago. I wasn't able to reproduce it at the time, because
> it only appeared in the backups I was restoring from.

Are you able to provide us with some before-and-after data so we
can see this corruption.

See, if it's dropped-bits or shifted-data or eight-byte-aligned
kernel addresses or whatever, that helps us generate theories..
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane