Micah Anderson | 1 Dec 2009 19:27

long self-test aborted by an interrupting command from host


I've been having some issues with some drives on a system, so I figured
a good thing to do would be to run the smartctl -t short and -t long
tests perhaps on a regular schedule. 

Turns out the short tests complete fine, but the long tests result in
the following:

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
...

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      40%     11051         -

Looking in my dmesg, I see the following happen:

[1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
[1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[1006133.810590] ata1.00: status: { DRDY }
[1006133.814461] ata1: hard resetting link
[1006134.291186] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[1006134.330429] ata1.00: max_sectors limited to 256 for NCQ
[1006134.350428] ata1.00: max_sectors limited to 256 for NCQ
[1006134.355850] ata1.00: configured for UDMA/133
[1006134.360263] ata1: EH complete
(Continue reading)

Justin Piszcz | 1 Dec 2009 19:42

Re: long self-test aborted by an interrupting command from host


On Tue, 1 Dec 2009, Micah Anderson wrote:

>
> I've been having some issues with some drives on a system, so I figured
> a good thing to do would be to run the smartctl -t short and -t long
> tests perhaps on a regular schedule.
>
> Turns out the short tests complete fine, but the long tests result in
> the following:
>
> General SMART Values:
> Offline data collection status:  (0x85)	Offline data collection activity
> 					was aborted by an interrupting command from host.
> 					Auto Offline Data Collection: Enabled.
> ...
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Interrupted (host reset)      40%     11051         -
>
> Looking in my dmesg, I see the following happen:
>
> [1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
> [1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> [1006133.810590] ata1.00: status: { DRDY }
> [1006133.814461] ata1: hard resetting link
> [1006134.291186] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [1006134.330429] ata1.00: max_sectors limited to 256 for NCQ
(Continue reading)

Tim Small | 1 Dec 2009 20:58

Re: long self-test aborted by an interrupting command from host

Micah Anderson wrote:
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Interrupted (host reset)      40%     11051         -
>
> Looking in my dmesg, I see the following happen:
>
> [1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
> [1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>   

> I'm trying to figure out where to go from here, is this a kernel issue
> in the SATA subsystem or something similar? 
>
> I'm running the Debian Lenny provided 2.6.26-2 kernel, using
> 5.38-2+lenny1 version of smartmontools.
>   

More likely to be a drive firmware bug, I would have thought - looks
like a command is issued, and the drive goes away.  Does the same thing
work with different drive models?  Also what SATA controller are you
using (plus drive model/firmware rev etc.?), and have you checked for
drive firmware updates?  You could also try turning NCQ off (set queue
length to 1 via the control file under /sys ) - although the error
report shows tag 0, so this probably isn't it...

Ta,

Tim.
(Continue reading)

Justin Piszcz | 1 Dec 2009 22:00

Re: long self-test aborted by an interrupting command from host


On Tue, 1 Dec 2009, Tim Small wrote:

> Micah Anderson wrote:
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
>> # 1  Extended offline    Interrupted (host reset)      40%     11051         -
>>
>> Looking in my dmesg, I see the following happen:
>>
>> [1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>> [1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
>> [1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>>
>
>> I'm trying to figure out where to go from here, is this a kernel issue
>> in the SATA subsystem or something similar?
>>
>> I'm running the Debian Lenny provided 2.6.26-2 kernel, using
>> 5.38-2+lenny1 version of smartmontools.
>>
>
>
> More likely to be a drive firmware bug, I would have thought - looks
> like a command is issued, and the drive goes away.  Does the same thing
> work with different drive models?  Also what SATA controller are you
> using (plus drive model/firmware rev etc.?), and have you checked for
> drive firmware updates?  You could also try turning NCQ off (set queue
> length to 1 via the control file under /sys ) - although the error
> report shows tag 0, so this probably isn't it...
(Continue reading)

Justin Piszcz | 1 Dec 2009 22:06

Re: long self-test aborted by an interrupting command from host


On Tue, 1 Dec 2009, Justin Piszcz wrote:

>
>
> On Tue, 1 Dec 2009, Tim Small wrote:
>
>> Micah Anderson wrote:
>>> SMART Self-test log structure revision number 1
>>> Num  Test_Description    Status                  Remaining 
>>> LifeTime(hours)  LBA_of_first_error
>>> # 1  Extended offline    Interrupted (host reset)      40%     11051 
>>> -
>>> 
>>> Looking in my dmesg, I see the following happen:
>>> 
>>> [1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
>>> frozen
>>> [1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 
>>> pio 512 in
>>> [1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 
>>> 0x4 (timeout)
>>> 
>> 
>>> I'm trying to figure out where to go from here, is this a kernel issue
>>> in the SATA subsystem or something similar?
>>> 
>>> I'm running the Debian Lenny provided 2.6.26-2 kernel, using
>>> 5.38-2+lenny1 version of smartmontools.
>>> 
(Continue reading)

Stefan Nowak | 1 Dec 2009 23:32
Picon
Picon

Re: Mac OS X - SCSI pass through - Driver/Code available!

So far, no reactions within this mailinglist.
Maybe at another place? Feature request ticket?
How are there chances for realisation of this perticular request?

Regards, Stefan Nowak

On 2009-11-26 at 20:16 Stefan Nowak wrote:

> Dear smartmontools developers!
>
> Until the very recent version of smartmontools (5.39), SMART through  
> USB was not possible on Mac OS X, because the Mac OS X kernel does  
> not support SCSI pass through. That was the reasoning of Christian  
> Franke.
>
> Meanwhile I googled for:
> http://www.google.com/search?q=mac+osx+scsi+passthrough+usb
>
> And found a person, who wrote a SCSI pass through driver for Mac OS X:
> http://tinyco.de/2009/02/04/writing-a-mac-osx-usb-device-driver-with-scsi-pass-through.html
>
> In the site's comment section I asked as Stefan Nowak whether this  
> code could be used for passing through SMART:
> http://tinyco.de/2009/02/04/writing-a-mac-osx-usb-device-driver-with-scsi-pass-through.html#comment-24129787
>
> And the developer answered within the site's comment section as  
> wagerlabs, that it should work:
> http://tinyco.de/2009/02/04/writing-a-mac-osx-usb-device-driver-with-scsi-pass-through.html#comment-24130057
>
> I don't know low-level coding, otherwise I would offer my help.  
(Continue reading)

micah anderson | 2 Dec 2009 17:48

Re: long self-test aborted by an interrupting command from host

Hi,

Excerpts from Tim Small's message of Tue Dec 01 14:58:04 -0500 2009:
> Micah Anderson wrote:
> > SMART Self-test log structure revision number 1
> > Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> > # 1  Extended offline    Interrupted (host reset)      40%     11051         -
> >
> > Looking in my dmesg, I see the following happen:
> >
> > [1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> > [1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
> > [1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> >   
> 
> > I'm trying to figure out where to go from here, is this a kernel issue
> > in the SATA subsystem or something similar? 
> >
> > I'm running the Debian Lenny provided 2.6.26-2 kernel, using
> > 5.38-2+lenny1 version of smartmontools.
> 
> More likely to be a drive firmware bug, I would have thought - looks
> like a command is issued, and the drive goes away.  Does the same thing
> work with different drive models? 

I've got another system with the same SATA controller, but with a
different drive, both are Western Digital... The drives we have been
talking about that are getting these resets are model # WDC
WD1001FALS-00J7B0, and this other system with the same SATA controller
have this model drive # WDC WD5001AALS-00L3B2
(Continue reading)

Justin Piszcz | 2 Dec 2009 18:05

Re: long self-test aborted by an interrupting command from host


On Wed, 2 Dec 2009, micah anderson wrote:

> Hi,
>
> Excerpts from Tim Small's message of Tue Dec 01 14:58:04 -0500 2009:
>> Micah Anderson wrote:
>>> SMART Self-test log structure revision number 1
>>> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
>>> # 1  Extended offline    Interrupted (host reset)      40%     11051         -
>>>
>>> Looking in my dmesg, I see the following happen:
>>>
>>> [1006133.798423] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>> [1006133.805959] ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
>>> [1006133.805963]          res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>>>
>>

micah,

Did you check the link I sent to the list regarding the WD Velociraptors?

Looks like the same issue.

Justin.

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
(Continue reading)

Christian Franke | 2 Dec 2009 22:23
Picon
Favicon

Re: Mac OS X - SCSI pass through - Driver/Code available!

Stefan Nowak wrote:
> So far, no reactions within this mailinglist.
> Maybe at another place? Feature request ticket?
>    

Yes, please open a new ticket in our wiki.

> How are there chances for realisation of this perticular request?
>    

Developers: Any volunteer?

Cheers,
Christian

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
Douglas Gilbert | 3 Dec 2009 02:22
Picon

Re: [smartmontools-devel] Mac OS X - SCSI pass through - Driver/Code available!

Christian Franke wrote:
> Stefan Nowak wrote:
>> So far, no reactions within this mailinglist.
>> Maybe at another place? Feature request ticket?
>>    
> 
> Yes, please open a new ticket in our wiki.
> 
> 
>> How are there chances for realisation of this perticular request?
>>    
> 
> Developers: Any volunteer?

Unless Apple have changed their policy, there is no generic
SCSI pass-through in OS-X. The last time I checked on this
subject, the suggestion from Apple's OS folks was to write
a kernel driver for each HBA we wanted to support. That is
not practical IMO.

If this policy has changed could someone supply a URL to
Apple's generic SCSI pass-through interface.

Doug Gilbert

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
(Continue reading)


Gmane