Dan Williams | 2 Feb 09:04
Picon
Favicon

Re: [PATCH v4 10/10] libsas: feed the scsi_block_when_processing_errors() meter

On Mon, Jan 16, 2012 at 9:11 PM, Dan Williams <dan.j.williams <at> intel.com> wrote:
> This is called per-sdev but in the sas-transport case this waits for the
> entire domain to recover which is never guaranteed to be less than 120
> seconds with libata taking nearly a minute per-device to recover.  Ping
> the waitqueue so that the hung task timer knows we're still making
> progress.

I'm now not so sure about this one.  This was implemented before the
change to asynchronously scan ata devices where serial discover could
take inordinate amounts of time.

Jack already reported a scan time reduction of 3 minutes down to 5
seconds with the async code, so I'll drop this patch.

Even if recovery was taking a long time this patch would only move the
hung task timeout backtrace to another location in the kernel that was
dependent on the completion of error handling.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Srivatsa S. Bhat | 2 Feb 09:38
Picon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Adding some Cc's.

On 02/02/2012 10:42 AM, Norbert Preining wrote:

> Dear all,
> 
> (please Cc)
> 
> since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
> time, namely the detection of the CD drive hangs for 10sec (both
> dmesg were taken on the same hardware on the same day):
> 
> with 3.3-rc1 and rc2:
> [    3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
> [    9.004013] ata2: link is slow to respond, please be patient (ready=0)
> [   13.652013] ata2: COMRESET failed (errno=-16)
> [   13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [   13.975721] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> [   13.977166] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
> [   13.981294] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> [   13.982734] ata2.00: configured for UDMA/33
> [   13.987964] scsi 1:0:0:0: CD-ROM            MATSHITA DVD-RAM UJ862AS  1.21 PQ: 0 ANSI: 5
> [   13.991482] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> [   13.992971] cdrom: Uniform CD-ROM driver Revision: 3.20
> [   13.994574] sr 1:0:0:0: Attached scsi CD-ROM sr0
> [   13.994672] sr 1:0:0:0: Attached scsi generic sg1 type 5
> [   14.316021] ata5: SATA link down (SStatus 0 SControl 300)
> [   14.636019] ata6: SATA link down (SStatus 0 SControl 300)
> 
> with 3.2:
(Continue reading)

Lin Ming | 3 Feb 02:15
Picon
Favicon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Thu, 2012-02-02 at 14:08 +0530, Srivatsa S. Bhat wrote:
> Adding some Cc's.
> 
> On 02/02/2012 10:42 AM, Norbert Preining wrote:
> 
> > Dear all,
> > 
> > (please Cc)
> > 
> > since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
> > time, namely the detection of the CD drive hangs for 10sec (both
> > dmesg were taken on the same hardware on the same day):
> > 
> > with 3.3-rc1 and rc2:
> > [    3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
> > [    9.004013] ata2: link is slow to respond, please be patient (ready=0)
> > [   13.652013] ata2: COMRESET failed (errno=-16)
> > [   13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

I'm looking into this problem.
But I can't reproduce this regression.

Could you attach the full 3.3-rc2 dmesg?

Thanks,
Lin Ming

> > [   13.975721] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> > [   13.977166] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
> > [   13.981294] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
(Continue reading)

Lin Ming | 3 Feb 05:21
Picon
Favicon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fri, 2012-02-03 at 09:15 +0800, Lin Ming wrote:
> On Thu, 2012-02-02 at 14:08 +0530, Srivatsa S. Bhat wrote:
> > Adding some Cc's.
> > 
> > On 02/02/2012 10:42 AM, Norbert Preining wrote:
> > 
> > > Dear all,
> > > 
> > > (please Cc)
> > > 
> > > since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
> > > time, namely the detection of the CD drive hangs for 10sec (both
> > > dmesg were taken on the same hardware on the same day):
> > > 
> > > with 3.3-rc1 and rc2:
> > > [    3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
> > > [    9.004013] ata2: link is slow to respond, please be patient (ready=0)
> > > [   13.652013] ata2: COMRESET failed (errno=-16)
> > > [   13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> 
> I'm looking into this problem.
> But I can't reproduce this regression.
> 
> Could you attach the full 3.3-rc2 dmesg?

And could you do a bisect?

First, you can check if commit 318893e is good or bad.

If it's bad, then you only need to do a bisect between v3.2 and 318893e.
(Continue reading)

Norbert Preining | 3 Feb 06:24
Picon
Favicon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fr, 03 Feb 2012, Lin Ming wrote:
> And could you do a bisect?

Done that. First failing commit is:
-----
commit 7faa33da9b7add01db9f1ad92c6a5d9145e940a7
Author: Tejun Heo <tj <at> kernel.org>
Date:   Fri Jul 22 11:41:26 2011 +0200

    ahci: start engine only during soft/hard resets

Previous commit booted without delay.

I didn't try to revert that commit on top of HEAD.

Suggestions?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan                                 TeX Live & Debian Developer
DSA: 0x09C5B094   fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
Program aborting:
Close all that you have worked on.
You ask far too much.
                       --- Windows Error Haiku
--
(Continue reading)

Lin Ming | 3 Feb 06:34
Picon
Favicon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fri, 2012-02-03 at 14:24 +0900, Norbert Preining wrote:
> On Fr, 03 Feb 2012, Lin Ming wrote:
> > And could you do a bisect?
> 
> Done that. First failing commit is:
> -----
> commit 7faa33da9b7add01db9f1ad92c6a5d9145e940a7
> Author: Tejun Heo <tj <at> kernel.org>
> Date:   Fri Jul 22 11:41:26 2011 +0200
> 
>     ahci: start engine only during soft/hard resets
> 
> Previous commit booted without delay.
> 
> I didn't try to revert that commit on top of HEAD.

Please revert that commit to test.
That helps us to make sure we find the exact bad commit.

Thanks.

> 
> Suggestions?
> 
> Best wishes
> 
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan                                 TeX Live & Debian Developer
(Continue reading)

Norbert Preining | 3 Feb 06:43
Picon
Favicon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fr, 03 Feb 2012, Lin Ming wrote:
> > I didn't try to revert that commit on top of HEAD.
> 
> Please revert that commit to test.
> That helps us to make sure we find the exact bad commit.

Confirmed. Reverted 7faa33da9b7 on top of 6c073a7ee250 made
the boot delay go away. dmesg from this boot attached.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan                                 TeX Live & Debian Developer
DSA: 0x09C5B094   fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
PELUTHO (n.) A South American ball game. The balls are whacked against
a brick wall with a stout wooden bat until the prisoner confesses.
			--- Douglas Adams, The Meaning of Liff
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.3.0-rc2+ (norbert <at> mithrandir) (gcc version 4.6.2 (Debian 4.6.2-12) ) #32
SMP PREEMPT Fri Feb 3 14:36:13 JST 2012
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.3.0-rc2+ root=/dev/sda3 ro rootfstype=ext4
"acpi_osi=!Windows 2006" nmi_watchdog=0
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
(Continue reading)

Lin Ming | 3 Feb 09:27
Picon
Favicon

Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fri, 2012-02-03 at 14:43 +0900, Norbert Preining wrote:
> On Fr, 03 Feb 2012, Lin Ming wrote:
> > > I didn't try to revert that commit on top of HEAD.
> > 
> > Please revert that commit to test.
> > That helps us to make sure we find the exact bad commit.
> 
> Confirmed. Reverted 7faa33da9b7 on top of 6c073a7ee250 made
> the boot delay go away. dmesg from this boot attached.

Dig into the code, but I can't find where the problem is.

Anyway, does below DEBUG patch help?
Let's always stop the engine during hard reset.

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index a72bfd0..8fef702 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -578,10 +578,6 @@ int ahci_stop_engine(struct ata_port *ap)

 	tmp = readl(port_mmio + PORT_CMD);

-	/* check if the HBA is idle */
-	if ((tmp & (PORT_CMD_START | PORT_CMD_LIST_ON)) == 0)
-		return 0;
-
 	/* setting HBA to idle */
 	tmp &= ~PORT_CMD_START;
 	writel(tmp, port_mmio + PORT_CMD);
(Continue reading)

Amit Sahrawat | 3 Feb 13:26
Picon

PATCH 1/1] scsi: retrieve cache mode using ATA_16 if normal routine fails

It has been observed that a number of USB HDD's do not respond correctly
to SCSI mode sense command(retrieve caching pages) which results in their
Write Cache being discarded by queue requests i.e., WCE if left set to
'0'(disabled).
This results in a number of Filesystem corruptions, when the device
is unplugged abruptly.

So, in order to identify the devices correctly - give it
a last try using ATA_16 after failure from normal routine.

Signed-off-by: Amit Sahrawat <a.sahrawat <at> samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon <at> samsung.com>

---
 drivers/ata/libata-scsi.c |   51 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.c         |   17 +++++++++++++++
 include/linux/libata.h    |    3 ++
 3 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 508a60b..d5b00e6 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -562,6 +562,57 @@ error:
 }

 /**
+ *      ata_get_cachestatus - Handler for to get WriteCache Status
+ *                      using ATA_16 scsi command
+ *      @scsidev: Device to which we are issuing command
(Continue reading)

Amit Sahrawat | 3 Feb 13:59
Picon

[PATCH 1/1] scsi: retrieve cache mode using ATA_16 if normal routine fails

It has been observed that a number of USB HDD's do not respond correctly
to SCSI mode sense command(retrieve caching pages) which results in their
Write Cache being discarded by queue requests i.e., WCE if left set to
'0'(disabled).
This results in a number of Filesystem corruptions, when the device
is unplugged abruptly.

So, in order to identify the devices correctly - give it
a last try using ATA_16 after failure from normal routine.

Signed-off-by: Amit Sahrawat <a.sahrawat <at> samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon <at> samsung.com>

---
 drivers/ata/libata-scsi.c |   51 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.c         |   17 +++++++++++++++
 include/linux/libata.h    |    3 ++
 3 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 508a60b..d5b00e6 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -562,6 +562,57 @@ error:
 }

 /**
+ *      ata_get_cachestatus - Handler for to get WriteCache Status
+ *                      using ATA_16 scsi command
+ *      @scsidev: Device to which we are issuing command
(Continue reading)


Gmane