Mark Lord | 1 Sep 2009 01:45
Picon
Favicon

Re: MD/RAID time out writing superblock

Mark Lord wrote:
> Tejun Heo wrote:
>> Ric Wheeler wrote:
> ..
>>> The drive might take a longer time like this when doing error handling
>>> (sector remapping, etc), but then I would expect to see your remapped
>>> sector count grow.
>>
>> Yes, this is a possibility and according to the spec, libata EH should
>> be retrying flushes a few times before giving up but I'm not sure
>> whether keeping retrying for several minutes is a good idea either.
>> Is it?
> ..
> 
> Libata will retry only when the FLUSH returns an error,
> and the next FLUSH will continue after the point where
> the first attempt failed.
> 
> But if the drive can still auto-relocate sectors, then the
> first FLUSH won't actually fail.. it will simply take longer
> than normal.
> 
> A couple of those, and we're into the tens of seconds range
> for time.
> 
> Still, it would be good to actually produce an error like that
> to examine under controlled circumstances.
> 
> Hmm.. I had a drive here that gave symptoms like that.
> Eventually, I discovered that drive had run out of relocatable
(Continue reading)

Mike Hokenson | 1 Sep 2009 04:00

Fwd: Long delay when booting with SATA DVD on Marvell 88SE6121

(sorry, resent to linux-ide without the html)

On Mon, Aug 31, 2009 at 9:02 AM, Tejun Heo <tj <at> kernel.org> wrote:
>
> > [    6.312129] ata2.00: qc timeout (cmd 0xa1)
> > [    6.312135] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [   11.352006] ata2: link is slow to respond, please be patient (ready=0)
> > [   16.336009] ata2: device not ready (errno=-16), forcing hardreset
> > [   21.532007] ata2: link is slow to respond, please be patient (ready=0)
> > [   26.348009] ata2: SRST failed (errno=-16)
> > [   31.544008] ata2: link is slow to respond, please be patient (ready=0)
> > [   36.360014] ata2: SRST failed (errno=-16)
> > [   36.544307] ata2.00: ATAPI: ASUS    DRW-2014L1T, 1.02, max UDMA/66
> > [   36.560317] ata2.00: configured for UDMA/66
> > [   36.576547] scsi 1:0:0:0: CD-ROM            ASUS     DRW-2014L1T
> > 1.02 PQ: 0 ANSI: 5
>
> "libata.force=2:udma33" would force it to udma/33 but I don't think
> that would be the problem here.  IDENTIFY doesn't use udma anyway.
> Does the problem go away if you do "libata.force=2:pio0"?

Thanks for the suggestion, but it didn't help. libata is setup as a
module in Debian's kernel, so I created a /etc/modprobe.d/libata.conf
with "options libata force=2:pio0" and rebuilt initrd. This appears to
have properly registered the force flag, but it seems to have only
been applied once the kernel worked through the issue with the
controller/port:

[    3.985969] usbhid: v2.6:USB HID core driver
[    6.316130] ata2.00: qc timeout (cmd 0xa1)
(Continue reading)

Steven Whitehouse | 1 Sep 2009 11:06
Picon
Favicon

Re: [PATCH 2/7] block: use blkdev_issue_discard in blk_ioctl_discard

Hi,

For the GFS2 bits:

Acked-by: Steven Whitehouse <swhiteho <at> redhat.com>

Steve.

On Sat, 2009-08-29 at 19:03 -0400, Christoph Hellwig wrote:
> plain text document attachment (discard-unify-code)
> blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
> the only difference between the two is that blkdev_issue_discard needs to
> send a barrier discard request and blk_ioctl_discard a non-barrier one,
> and blk_ioctl_discard needs to wait on the request.  To facilitates this
> add a flags argument to blkdev_issue_discard to control both aspects of the
> behaviour.  This will be very useful later on for using the waiting
> funcitonality for other callers.
> 
> Based on an earlier patch from Matthew Wilcox <matthew <at> wil.cx>.
> 
> 
> Signed-off-by: Christoph Hellwig <hch <at> lst.de>
> 
> Index: linux-2.6/block/blk-barrier.c
> ===================================================================
> --- linux-2.6.orig/block/blk-barrier.c	2009-08-29 16:49:43.067370900 -0300
> +++ linux-2.6/block/blk-barrier.c	2009-08-29 17:43:30.407344330 -0300
>  <at>  <at>  -348,6 +348,9  <at>  <at>  static void blkdev_discard_end_io(struct
>  		clear_bit(BIO_UPTODATE, &bio->bi_flags);
>  	}
(Continue reading)

Huang, Shane | 1 Sep 2009 14:22
Picon
Favicon

RE: [PATCH #upstream, v2] ahci: Implement SATA AHCI FIS-based switching support

Hi Tejun, 

> -----Original Message-----
> From: Tejun Heo [mailto:tj <at> kernel.org] 
> 
> 
> Why does fbs_need_dec need to be in ahci_port_priv?  Can't it be a
> local variable of error_intr()?

Yes, it should be replaced by one local variable.

> > +static void ahci_fbs_dec_intr(struct ata_port *ap)
> > +{
> > +	struct ahci_port_priv *pp = ap->private_data;
> > +	DPRINTK("ENTER\n");
> > +
> > +	if (pp->fbs_enabled) {
> 
> Just do BUG_ON(!pp->fbs_enabled);

OK.

 
> > +		void __iomem *port_mmio = ahci_port_base(ap);
> > +		u32 fbs = readl(port_mmio + PORT_FBS);
> > +		int retries = 3;
> > +
> > +		/* time to wait for DEC is not specified by AHCI spec,
> > +		 * add a retry loop for safety */
> > +		do {
(Continue reading)

Tejun Heo | 1 Sep 2009 14:28

Re: [PATCH #upstream, v2] ahci: Implement SATA AHCI FIS-based switching support

Hello, Shane.

Huang, Shane wrote:
>> You can do
>>
>> 	pmp = fbs >> PORT_FBS_DWE_OFFSET;
>> 	if (pmp < ap->nr_pmp_links && 
>> ata_link_online(&ap->pmp_link[pmp])) {
>> 		link = &ap->pmp_link[pmp];
>> 		pp->fbs_need_dec = true;
>> 	}
> 
> PORT_FBS_SDE also need check, because(ahci v1.2  3.3.16):
> Device With Error (DWE): Set by hardware to the value of the
> Port Multiplier port number of the device that experienced a fatal
> error condition. This field is only valid when PxFBS.SDE = '1'.
> 
> So I'll update the code into:
> 	u32 fbs = readl(port_mmio + PORT_FBS);
> 	int pmp = fbs >> PORT_FBS_DWE_OFFSET;
> 
> 	if ((fbs & PORT_FBS_SDE) && (pmp < ap->nr_pmp_links) &&
> 	    ata_link_online(&ap->pmp_link[pmp])) {
> 		link = &ap->pmp_link[pmp];
> 		fbs_need_dec = true;
> 	}

Yeap, I missed the condition while trying to point out that the loop
wasn't necessary there.  Sorry.  :-P

(Continue reading)

Andrei Tanas | 1 Sep 2009 15:07
Picon

Re: MD/RAID time out writing superblock

>>>> The drive might take a longer time like this when doing error handling
>>>> (sector remapping, etc), but then I would expect to see your remapped
>>>> sector count grow.
>>>
>>> Yes, this is a possibility and according to the spec, libata EH should
>>> be retrying flushes a few times before giving up but I'm not sure
>>> whether keeping retrying for several minutes is a good idea either.
>>> Is it?
>> ..
>> 
>> Libata will retry only when the FLUSH returns an error,
>> and the next FLUSH will continue after the point where
>> the first attempt failed.
>> 
>> But if the drive can still auto-relocate sectors, then the
>> first FLUSH won't actually fail.. it will simply take longer
>> than normal.
>> 
>> A couple of those, and we're into the tens of seconds range
>> for time.
>> 
>> Still, it would be good to actually produce an error like that
>> to examine under controlled circumstances.
>> 
>> Hmm.. I had a drive here that gave symptoms like that.
>> Eventually, I discovered that drive had run out of relocatable
>> sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
>> and perhaps we can recreate this specific scenario on it..
> ..
> 
(Continue reading)

Huang, Shane | 1 Sep 2009 15:08
Picon
Favicon

RE: [PATCH #upstream, v2] ahci: Implement SATA AHCI FIS-based switching support

Hi Tejun, 

> -----Original Message-----
> From: Tejun Heo [mailto:tj <at> kernel.org] 
> 
> >>> +	/* check FBS capability */
> >>> +	if ((hpriv->cap & HOST_CAP_FBS) && sata_pmp_supported(ap)) {
> >>> +		void __iomem *port_mmio = ahci_port_base(ap);
> >>> +		u32 cmd = readl(port_mmio + PORT_CMD);
> >>> +		if (cmd & PORT_CMD_FBSCP)
> >>> +			pp->fbs_supported = true;
> >> Maybe whine a bit if CAP indicates FBS but PORT_CMD doesn't?
> > 
> > Sure, updated as below:
> > 	if (cmd & PORT_CMD_FBSCP)
> > 		pp->fbs_supported = true;
> > 	else
> > 		WARN_ON(1);
> 
> WARN_ON() would be a tad bit too scary.  Given that on certain
> hardwares it would always trigger.  A dev_printk() would be better.

Well..., then:
	if (cmd & PORT_CMD_FBSCP)
		pp->fbs_supported = true;
	else
		dev_printk(KERN_WARNING, ap->host->dev,
			   "The port is not capable of FBS\n");

Quoting myself:
(Continue reading)

Tejun Heo | 1 Sep 2009 15:14

Re: [PATCH #upstream, v2] ahci: Implement SATA AHCI FIS-based switching support

Hello, Shane.

Huang, Shane wrote:
> Quoting myself:
>> static void ahci_disable_fbs(struct ata_port *ap)
>> {
>> 	struct ahci_port_priv *pp = ap->private_data;
>> 	void __iomem *port_mmio = ahci_port_base(ap);
>> 	u32 fbs;
>> 	int rc;
>>
>> 	if (!pp->fbs_supported)
>> 		return;
>>
>> 	WARN_ON(!pp->fbs_enabled);
>>
>> 	rc = ahci_stop_engine(ap);
> 
> I find that ahci_pmp_detach() will be called for each SATA port
> during the initialization, right after print:
>> ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp fbs...
> 
> so will ahci_disable_fbs() be called, which leads to the trigger
> of WARN_ON().

Ah.. right.  ahci_port_resume() does that to make sure that all PMP
bits are cleared on init.  Hmmm... probably it would be better to make
ahci_disable_fbs() to just do it regardless of libata thinks whether
PMP is attached or not.  After resume from STR, we shouldn't be
assuming anything about the controller state.
(Continue reading)

Mark Lord | 1 Sep 2009 15:15
Picon
Favicon

Re: MD/RAID time out writing superblock

Andrei Tanas wrote:
>>>>> The drive might take a longer time like this when doing error handling
>>>>> (sector remapping, etc), but then I would expect to see your remapped
>>>>> sector count grow.
>>>> Yes, this is a possibility and according to the spec, libata EH should
>>>> be retrying flushes a few times before giving up but I'm not sure
>>>> whether keeping retrying for several minutes is a good idea either.
>>>> Is it?
>>> ..
>>>
>>> Libata will retry only when the FLUSH returns an error,
>>> and the next FLUSH will continue after the point where
>>> the first attempt failed.
>>>
>>> But if the drive can still auto-relocate sectors, then the
>>> first FLUSH won't actually fail.. it will simply take longer
>>> than normal.
>>>
>>> A couple of those, and we're into the tens of seconds range
>>> for time.
>>>
>>> Still, it would be good to actually produce an error like that
>>> to examine under controlled circumstances.
>>>
>>> Hmm.. I had a drive here that gave symptoms like that.
>>> Eventually, I discovered that drive had run out of relocatable
>>> sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
>>> and perhaps we can recreate this specific scenario on it..
>> ..
>>
(Continue reading)

Tejun Heo | 1 Sep 2009 15:30

Re: MD/RAID time out writing superblock

Hello,

Mark Lord wrote:
>> Mine errored out again with exactly the same symptoms, this time after
>> only
>> few days and with the "tunable" set to 2 sec. I got a warranty
>> replacement
>> but haven't shipped this one yet. Let me know if you want it.
> ..
> 
> Not me.  But perhaps Tejun ?

I think you're much more qualified than me on the subject. :-)

Anyone else?  Ric, are you interested with playing the drive?

--

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane