Dr. Clea F. Rees | 1 Jul 2010 21:10

disappearing extended self tests & other errors

If my question is already answered somewhere, my apologies and I'd be
grateful for a pointer. I did go through the FAQ and search the
archives but I wasn't certain which search terms to use so may well
have missed something.

I've had smartmontools monitoring one disk for sometime. I would
occasionally see transient errors. For example, very occasionally, a
self test would fail but later self tests would pass. Earlier this
week, I started to see a rising number of bad ("pending") sectors and a
large number of reallocation attempts. No reallocated sectors, though,
and zero "offline uncorrectable". The raw read error rate fluctuates
wildly and I can't remember if this happened before or not. It might be
in the thousands one moment and 0 a few minutes later. I also saw
repeated self test failures (although some passed).

This is a Mac, so I ran Apple's hardware tests from the install disk.
Initially, I got an error but hadn't realised I should have removed all
peripherals so I did that and the error disappeared. I repeated the
extended test three times without finding any problems.

I was also seeing fsck errors but repeated iterations seemed to resolve
them. In addition, I got two I/O errors when cloning to an external
drive and the system had difficulty accessing some data during normal
operations. (Errors in the system logs.)

I then ran Tech Tool Deluxe from CD, including a surface scan, which
reported no errors.

With the "pending" sector count still rising, I managed to get a
successful cloning operation through and decided to wipe the disk and
(Continue reading)

Geoff Keating | 3 Jul 2010 21:40

Re: disappearing extended self tests & other errors


On 01/07/2010, at 12:10 PM, Dr. Clea F. Rees wrote:

> So today, I had Disk Utility erase the disk, writing zeros once over
> everything in an attempt to force reallocation of the bad sectors.
> 
> smartctl now tells me I have zero "pending" sectors, zero offline
> uncorrectable and zero reallocated, which I don't understand.

This is normal.  The disk rewrote the sectors successfully.

> The raw
> read error rate continues to fluctuate.

This is normal too.  The SMART output shows

  1 Raw_Read_Error_Rate     0x000b   099   099   062    Pre-fail  Always       -       131072

you can see the 'VALUE' field is 99, well above the problem level of 62, so the fluctuations don't indicate a problem.

> Short self tests are passing
> but extended self tests simply disappear. That is, they are initiated,
> smartctl -a shows the test is in progress and then they vanish. The
> test doesn't fail, it simply never shows up in the log and is no longer
> shown as in progress. The first time I tried running the extended test
> (post-zeroing), I got an error from smartd saying it could not read the
> attribute data but subsequent attempts to run the test have produced no
> errors at all - just temporary indications the test is running and then
> nothing.

(Continue reading)

cfrees | 5 Jul 2010 00:47

Re: disappearing extended self tests & other errors

First, thank you _very_ much for your reply.

On Sat 3rd Jul, 2010 at 12:40, Geoff Keating seems to have written:

>
> On 01/07/2010, at 12:10 PM, Dr. Clea F. Rees wrote:
>
>> So today, I had Disk Utility erase the disk, writing zeros once over
>> everything in an attempt to force reallocation of the bad sectors.
>>
>> smartctl now tells me I have zero "pending" sectors, zero offline
>> uncorrectable and zero reallocated, which I don't understand.
>
> This is normal.  The disk rewrote the sectors successfully.
>
Good to know. (I thought it would reallocate them if they were bad so
couldn't understand the results.)

>> The raw
>> read error rate continues to fluctuate.
>
> This is normal too.  The SMART output shows
>
>  1 Raw_Read_Error_Rate     0x000b   099   099   062    Pre-fail  Always       -       131072
>
> you can see the 'VALUE' field is 99, well above the problem level of 62, so the fluctuations don't indicate a problem.
>
>> Short self tests are passing
>> but extended self tests simply disappear. That is, they are initiated,
>> smartctl -a shows the test is in progress and then they vanish. The
(Continue reading)

BitBucket | 6 Jul 2010 05:34
Picon

Problem clearing Uncorrectable offline error

Hi:

I'm having some problem clearling a single offline uncorrectable error:

198 Offline_Uncorrectable  0x0012  200  200  000  Old_age  Always - 1

So when I boot the system, smartd sends out a warning message about 
this.

I've written zeros to the drive a number of times, thereby clearing it 
of pending sectors as well as picking up and clearing some new ones, 
exercised it with Spinrite 6 L2 and L4 modes, run smartctl -t long on it 
twice now with no errors, yet still have this

Below are my notes and the most recent test result.  Thank you in 
advance for your help.

-- Roy Zider

Notes on attempt to remove 1 offline uncorrectable error
7/5/2010 8:25:06 PM

Send this note to mailing list

-- Resume more tests
 .. 7/5/2010 8:08:41 PM write up notes
 .. Wrote more zeros with DLGDIAG, retested OK (conveyance tests)
 .. ran Spinrite 6, L2 then L4, but failed to capture report of L4
 .. check with smartctl, long tests 2x no errors
 .. but still 1 one offline uncorrectable error (no pending anymore)
(Continue reading)

Eric Shubert | 6 Jul 2010 22:25
Favicon

vmware server raw disk

On a VMware Server 2 (CentOS5) host, I'm trying to run smartd on a 
CentOS5 guest which has access to a pair of raw sata disks. The disks 
are accessible, and configured as a raid-1 mirror. (I know this is 
unsupported, but it works).

When I attempt to start smartd configured to monitor the 2 raw disks, I get:
Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdb, opened
Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdb, not ATA, no 
IDENTIFY DEVICE Structure
Jul  6 12:46:01 tacs-udat smartd[6316]: Unable to register ATA device 
/dev/sdb at line 46 of file /etc/smartd.conf
Jul  6 12:46:01 tacs-udat smartd[6316]: Device /dev/sdb not available
Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdc, opened
Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdc, not ATA, no 
IDENTIFY DEVICE Structure
Jul  6 12:46:01 tacs-udat smartd[6316]: Unable to register ATA device 
/dev/sdc at line 54 of file /etc/smartd.conf
Jul  6 12:46:01 tacs-udat smartd[6316]: Device /dev/sdc not available

I've searched, and (not surprisingly) can't find a thing on this.

Is there some setting in the configuration I can use to make this work, 
or is the VMware Server Raw Disk mapping simply unable to handle 
smartctl commands?

Thanks for any pointers.

--

-- 
-Eric 'shubes'

(Continue reading)

Wolfgang Breyha | 7 Jul 2010 00:36
Picon

LaCie rikiki 0x059f:0x102a -d usbjmicron

Hi!

I got a Lacie rikiki with USB ID 0x059f:0x102a. It is currently unsupported
by smartmontools 5.39.1 included with Fedora 13.

Both -d usbjmicron and -d usbjmicron,x work.

Output included at the end.

Regards, Wolfgang

> # smartctl -a /dev/sdf
> smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> /dev/sdf: Unknown USB bridge [0x059f:0x102a (0x100)]
> Smartctl: please specify device type with the -d option.
> 
> Use smartctl -h to get a usage summary

> # smartctl -d usbjmicron -a /dev/sdf
> smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Momentus 5400.6 series
> Device Model:     ST9500325AS
> Serial Number:    xxxxxxxx
> Firmware Version: 0002BSM1
> User Capacity:    500,107,862,016 bytes
(Continue reading)

Christian Franke | 7 Jul 2010 08:50
Picon
Favicon

Re: vmware server raw disk

Eric Shubert wrote:
> On a VMware Server 2 (CentOS5) host, I'm trying to run smartd on a
> CentOS5 guest which has access to a pair of raw sata disks. The disks
> are accessible, and configured as a raid-1 mirror. (I know this is
> unsupported, but it works).
>
> When I attempt to start smartd configured to monitor the 2 raw disks, I get:
> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdb, opened
> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdb, not ATA, no
> IDENTIFY DEVICE Structure
> Jul  6 12:46:01 tacs-udat smartd[6316]: Unable to register ATA device
> /dev/sdb at line 46 of file /etc/smartd.conf
> Jul  6 12:46:01 tacs-udat smartd[6316]: Device /dev/sdb not available
> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdc, opened
> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdc, not ATA, no
> IDENTIFY DEVICE Structure
> Jul  6 12:46:01 tacs-udat smartd[6316]: Unable to register ATA device
> /dev/sdc at line 54 of file /etc/smartd.conf
> Jul  6 12:46:01 tacs-udat smartd[6316]: Device /dev/sdc not available
>
>    

Which kind of controller does vmware emulate to access the raw disks: 
IDE/ATA or SCSI ?

> I've searched, and (not surprisingly) can't find a thing on this.
>
> Is there some setting in the configuration I can use to make this work,
> or is the VMware Server Raw Disk mapping simply unable to handle
> smartctl commands?
(Continue reading)

Eric Shubert | 7 Jul 2010 16:21
Favicon

Re: vmware server raw disk

Christian Franke wrote:
> Eric Shubert wrote:
>> On a VMware Server 2 (CentOS5) host, I'm trying to run smartd on a
>> CentOS5 guest which has access to a pair of raw sata disks. The disks
>> are accessible, and configured as a raid-1 mirror. (I know this is
>> unsupported, but it works).
>>
>> When I attempt to start smartd configured to monitor the 2 raw disks, I get:
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdb, opened
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdb, not ATA, no
>> IDENTIFY DEVICE Structure
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Unable to register ATA device
>> /dev/sdb at line 46 of file /etc/smartd.conf
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Device /dev/sdb not available
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdc, opened
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Device: /dev/sdc, not ATA, no
>> IDENTIFY DEVICE Structure
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Unable to register ATA device
>> /dev/sdc at line 54 of file /etc/smartd.conf
>> Jul  6 12:46:01 tacs-udat smartd[6316]: Device /dev/sdc not available
>>
>>    
> 
> Which kind of controller does vmware emulate to access the raw disks: 
> IDE/ATA or SCSI ?

SCSI emulation. There is a choice between LSI Logic and BUSLOGIC. I 
believe that LSI is the default for linux guests, and BUS is the default 
for windows guests.

(Continue reading)

Christian Franke | 7 Jul 2010 18:01
Picon
Favicon

Re: vmware server raw disk

Eric Shubert wrote:
> Christian Franke wrote:
>    
>> ...
>> Raw disk probably means that vmware provides transparent READ/WRITE
>> mapping between virtual and physical disk.
>>      
> I believe this is the case.
>
>    
>> This may or may not include
>> the ATA pass-through functionality which is required for smartmontools.
>>      
> (Light bulb goes on)
>
> I didn't realize/remember that smartmontools only works with ATA
> devices. I've only used it for ATA in the past.

smartmontools also supports SCSI SMART, but this is very different from 
ATA SMART.

> I'll look into using
> SATA configuration instead of SCSI emulation, and see what I get.
>
>    

Please report the results to this list if possible.

'hdparm -I /dev/ice' may help to check which optional features of the 
physical disk are actually exposed to the guest OS.
(Continue reading)

Ron TechGuy | 9 Jul 2010 02:51
Picon
Favicon

MegaRaid/Perc6 "Can't Get Bus number error"

Hi, 

 I have been unable to get smartmontools to query the drives under my MegaRaid 8888ELP SAS controller.

smartctl -a  -d megaraid,0 /dev/megaraid_sas_ioctl_node
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl open device: /dev/megaraid_sas_ioctl_node [megaraid_disk_00] failed: can't get bus number

I also tried using the "/dev/megadev0" and the /dev/sda, /dev/sba devices without any success

dmesg megaraid driver output:
[    2.821783] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
[    2.822047] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[    2.822236] megasas: 00.00.04.01 Thu July 24 11:41:51 PST 2008
[    2.822398] megasas: 0x1000:0x0060:0x1000:0x1006: bus 5:slot 0:func 0
[    2.822621] megaraid_sas 0000:05:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32
[    2.822860] megaraid_sas 0000:05:00.0: setting latency timer to 64
[    2.822954] megasas: FW now in Ready state
[    2.880280] scsi0 : LSI SAS based MegaRAID driver
[    2.882048] scsi 0:0:16:0: Direct-Access     IBM-ESXS VPBA300C3ETS11 N A270 PQ: 0 ANSI: 5
[    2.883158] scsi 0:0:17:0: Direct-Access     IBM-ESXS VPBA300C3ETS11 N A270 PQ: 0 ANSI: 5
[    2.884913] scsi 0:0:18:0: Direct-Access     IBM-ESXS VPBA300C3ETS11 N A270 PQ: 0 ANSI: 5
[    2.886056] scsi 0:0:19:0: Direct-Access     IBM-ESXS VPBA300C3ETS11 N A2E2 PQ: 0 ANSI: 5
[    2.888740] scsi 0:0:20:0: Direct-Access     ATA      Hitachi HUA72202 A3EA PQ: 0 ANSI: 5
[    2.891461] scsi 0:0:21:0: Direct-Access     ATA      Hitachi HUA72202 A3EA PQ: 0 ANSI: 5
[    2.900213] scsi 0:2:0:0: Direct-Access     LSI      MegaRAID 8888ELP 1.20 PQ: 0 ANSI: 5
[    2.900751] scsi 0:2:1:0: Direct-Access     LSI      MegaRAID 8888ELP 1.20 PQ: 0 ANSI: 5
[    2.906375] st: Version 20081215, fixed bufsize 32768, s/g segs 256
(Continue reading)


Gmane