Borislav Petkov | 1 Aug 2011 10:48

Re: Machine check exception

On Sun, Jul 31, 2011 at 11:56:33AM -0400, F. P. Beekhof wrote:
> $ sudo setpci -s 18.3 0x44.l
> 00400040

Interesting, this says ECC is enabled on your machine. Do you know the
exact models of your DIMMs? Sometimes they can be found in dmidecode
output so can you do

dmidecode > dmidecode.out

and

lspci -s 18.3 -xxxx > f3.out

as root and send me both files pls?

..

> >> Are there any conclusions that can be drawn from this experiment ?
> >
> > Yeah, it means that your BIOS doesn't seem to have the fix for erratum
> > #131: http://support.amd.com/us/Processor_TechDocs/25759.pdf, page 83.
> 
> So, this is not a problem with the promise-sata driver as I originally 
> suspected. I guess then we can take this discussion off the linux-ide 
> list...

Yeah, looks like it.

--

-- 
(Continue reading)

Kushal Koolwal | 1 Aug 2011 20:04
Picon

PATA RDC IDE driver READ DMA error messages

Hi,

We have an x86 system based on the VortexDX SoC with RDC IDE
controller that supports up to UDMA2 IDE speeds. Whenever a Compact
Flash is present on the IDE controller, I get the following ATA error
handling messages before the IDE Linux kernel driver (pata_rdc.ko)
finally settles down (after almost a minute or so) to PIO4 speed for
the Compact Flash:

[   37.827489] ata1: lost interrupt (Status 0x58)
[   37.827910] ata1: drained 512 bytes to clear DRQ.
[   37.827961] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   37.828060] ata1.01: failed command: READ DMA
[   37.828162] ata1.01: cmd c8/00:08:00:00:00/00:00:00:00:00/f0 tag 0
dma 4096 in
[   37.828176]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[   37.828311] ata1.01: status: { DRDY }
[   37.828452] ata1: soft resetting link
[   38.280580] ata1.00: configured for UDMA/33
[   38.287670] ata1.01: configured for MWDMA2
[   38.287707] ata1.01: device reported invalid CHS sector 0
[   38.287753] ata1: EH complete
[   68.834877] ata1: lost interrupt (Status 0x58)
[   68.835285] ata1: drained 512 bytes to clear DRQ.
[   68.835323] ata1.01: limiting speed to MWDMA1:PIO4
[   68.835354] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   68.835448] ata1.01: failed command: READ DMA
[   68.835548] ata1.01: cmd c8/00:08:00:00:00/00:00:00:00:00/f0 tag 0
dma 4096 in
(Continue reading)

Raja Appuswamy | 2 Aug 2011 20:45
Picon

question regarding ncq implementation

Hey guys,
   Over the past few days i've been trying to work on modifying the
AHCI driver for an experimental research operating system to support
NCQ. I already have a fully functional AHCI driver without command
queuing. But when i try to  READ FPDMA QUEUED command instead of the
READ DMA command, i get an interrupt with the P0IS register containing
the mask 0x45000008 (TFES, OFS and SDBS bits set). On dumping the log
page, i get 41h as status (DRDY and ATA_ERR if i am right)  and 84h as
feature/error. The non-ncq driver works just fine without any issues
and there is absolutely no way the hardware is at fault this time. I
have checked out the linux code and a 84h error refers to the ICRC and
ATA_ABORTED bits if i am not mistaken. I would really appreciate any
help from devs regarding how to debug this issue further. Also, are
any of the other values i get back from REAL LOG EXT useful for
debugging? If so, how do i interpret them?  I know this is not a linux
related question. So if you think i should be targeting a different
mailing list, i'd really appreciate it if you could please point me in
the right direction and i'll definitely stop bugging you guys.
Thanks in advance,
Raj
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Timo Juhani Lindfors | 3 Aug 2011 01:27
Picon
Picon
Favicon
Gravatar

frequent spin down (acer aspire 5552 / WDC WD5000BPVT-22HXZT1, FwRev=01.01A01)

Hi,

the hdd in my acer aspire 5552 seems to aggressively (about once a
minute) spin down in grub prompt, single user mode and gnome. This
happens both on AC power and battery.

hdparm -S 0 /dev/sda

does not help but

hdparm -B 255 /dev/sda

seems to help. I'm using debian squeeze. More information from the
installation is available at http://lindi.iki.fi/lindi/acer_aspire_5552/
but here are the basics:

/dev/sda:

 Model=WDC WD5000BPVT-22HXZT1, FwRev=01.01A01, SerialNo=WD-WXF1AC074916
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

(Continue reading)

Brian Norris | 3 Aug 2011 02:06
Picon

Re: ahci_start_engine compliance with AHCI spec

Hi Tejun,

I wanted to follow up a bit on your last comments.

On Fri, Jul 22, 2011 at 2:03 AM, Tejun Heo <tj <at> kernel.org> wrote:
> The problem is that both my and your approach aren't ultimately safe
> on this particular IP block.  I don't think it's possible make things
> completely safe for it.  There's no mutual exclusion against PHY
> events - be it flaky signal, power surge or actual hotplug - and
> driver operation.  No matter how careful the driver behaves, if PHY
> events happen after the last check before starting DMA engine, DRQ may
> be set by the time driver gets to it.

Can DRQ be set from 0->1 without a software-initiated action? I didn't
think it was directly tied to PHY events, and so we can fairly well
predict that it will remain 0.

On the other hand, PxSSTS.DET can be affected by PHY, but I don't
believe DET != 3 directly triggers this hardware bug.

> The IP block you're dealing with is inherently buggy.  What the spec
> means, I think, is the DMA engine might not start or behave properly
> if enabled while DRQ is set, which is fine.  Driver will notice that,
> reset stuff and retry.  It is *completely* different from "the
> controller becomes brick until power cycled if that happens".  So, we
> can work around all we want but that is one buggy controller.  If
> possible, please tell the manufacturer or licensor to fix it.

Yes, I believe the hardware designers know how buggy this is...but
it's still worth some effort to fix the software as well as possible
(Continue reading)

Tejun Heo | 4 Aug 2011 11:15

[PATCH #upstream-fixes] pata_via: disable ATAPI DMA on AVERATEC 3200

On AVERATEC 3200, pata_via causes memory corruption with ATAPI DMA,
which often leads to random kernel oops.  The cause of the problem is
not well understood yet and only small subset of machines using the
controller seem affected.  Blacklist ATAPI DMA on the machine.

Signed-off-by: Tejun Heo <tj <at> kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=11426
Reported-and-tested-by: Jim Bray <jimsantelmo <at> gmail.com>
Cc: Alan Cox <alan <at> linux.intel.com>
Cc: stable <at> kernel.org
---
 drivers/ata/pata_via.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/ata/pata_via.c b/drivers/ata/pata_via.c
index 65e4be6..8e9f504 100644
--- a/drivers/ata/pata_via.c
+++ b/drivers/ata/pata_via.c
 <at>  <at>  -124,6 +124,17  <at>  <at>  static const struct via_isa_bridge {
 	{ NULL }
 };

+static const struct dmi_system_id no_atapi_dma_dmi_table[] = {
+	{
+		.ident = "AVERATEC 3200",
+		.matches = {
+			DMI_MATCH(DMI_BOARD_VENDOR, "AVERATEC"),
+			DMI_MATCH(DMI_BOARD_NAME, "3200"),
+		},
+	},
(Continue reading)

Tejun Heo | 4 Aug 2011 11:44

Re: ahci_start_engine compliance with AHCI spec

Hello,

On Tue, Aug 02, 2011 at 05:06:13PM -0700, Brian Norris wrote:
> Can DRQ be set from 0->1 without a software-initiated action? I didn't
> think it was directly tied to PHY events, and so we can fairly well
> predict that it will remain 0.

Sure it can.  It's updated by Register D2H FISes sent by the device.
It can change after a PHY event; otherwise, I don't think it would get
set during init without any command in flight, right?

> Yes, I believe the hardware designers know how buggy this is...but
> it's still worth some effort to fix the software as well as possible
> for current hardware behavior.

Yeap, sure, let's get it working for majority of cases.  I just wanted
to point out that the hardware eventually needs to be fixed.

Thanks.

--

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Raja Appuswamy | 5 Aug 2011 15:54
Picon

regarding NCQ implementation

Hey guys,
   Over the past few days i've been trying to work on modifying the
AHCI driver for an experimental research operating system to support
NCQ. I already have a fully functional AHCI driver without command
queuing. But when i try to  READ FPDMA QUEUED command instead of the
READ DMA command, i get an interrupt with the P0IS register containing
the mask 0x45000008 (TFES, OFS and SDBS bits set). SERR reports
3000400 which is a protocol violation error with the DIAG field
reporting an unknown FIS.
On dumping the log page, i get 41h as status (DRDY and ATA_ERR if i am
right)  and 84h as
feature/error. The non-ncq driver works just fine without any issues
and there is absolutely no way the hardware is at fault this time. I
have checked out the linux code and a 84h error refers to the ICRC and
ATA_ABORTED bits if i am not mistaken. I would really appreciate any
help from devs regarding how to debug this issue further. Also, are
any of the other values i get back from REAL LOG EXT useful for
debugging? If so, how do i interpret them?  I know this is not a linux
related question. So if you think i should be targeting a different
mailing list, i'd really appreciate it if you could please point me in
the right direction and i'll definitely stop bugging you guys.
Thanks in advance,
Raja
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kushal Koolwal | 5 Aug 2011 20:58
Picon

Re: PATA RDC IDE driver READ DMA error messages

> Whenever a Compact Flash is present on the IDE controller, I get the following ATA error
> handling messages before the IDE Linux kernel driver (pata_rdc.ko)
> finally settles down (after almost a minute or so) to PIO4 speed for the Compact Flash:
I am guessing this may be due to the fact that the pata_rdc driver
does not handle all the cases (cable types and DMA modes) unlike the
Intel piix driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane