Bruce Allen | 1 Apr 2009 07:21
Picon

Re: smartctl, reallocated sector count question

David,

Thanks for the update.  I suspect this is probably due to buggy SMART 
firmware on the disk.  When writing and reviewing disk firmware, the 
disk vendors seem to be mostly concerned with performance (read/write 
speed) since this is what gets tested by reviewers and is used by 
customers in determining what to buy.  The SMART part of the firmware is 
often an afterthought, and is not written or reviewed with the same 
attention to detail.

Cheers,
	Bruce

David Mathog wrote:
> Summary so far:  some old Seagate ST340016A disks were found to have
> nonzero 'Offline uncorrectable' and 'Current_Pending_Sector' counts
> which could not be reset to zero by writing to every block on the disk.
> 
> I contacted Seagate about this issue, and the best I could get out of
> them (on the second attempt) was:
> 
> | I understand you are getting unclearable SMART 197 and 198
> | fields on your drive.  We do not recommend tampering with your
> | SMART values on the drive in any way.  We do not have any utility
> | ourselves for clearing these fields.  Seatools is the only valid
> | diagnostic that we use to test the drives for functionality.  If
> | the drive passes both the short and long test of Seatools then 
> | the drive itself is fine.  If it fails the tests then the drive
> | should be replaced.
> 
(Continue reading)

Tim Small | 1 Apr 2009 10:51

Re: smartctl, reallocated sector count question


> David Mathog wrote:

>> Anyway, no answer on why these particular drives ended up with these
>> counts "stuck".

Hmm.  Just one thought - if you have the LBAs of the original errors, I
wonder if it's worth trying to use hdparm's "--make-bad-sector" and then
"--write-sector" commands?  Bit of a long-shot but worth a try...

Cheers,

Tim.

------------------------------------------------------------------------------
Jorge Bastos | 3 Apr 2009 15:14
Picon

Help on stuff

Hi there Bruce,

 

I’ve changed disk’s on a small DNS server, and they seems to have SMART not full compatible, there’s one of them not OK and I have kernel messages on kernel that has a sector damaged, but smartd it’s not sending me the email.

I have this on syslog:

 

Is this solve”able”?

NOTE: I’ve added –T on the smartd.conf and it didn’t started with it.

DEVICESCAN -m dns <at> email.pt -M exec /usr/share/smartmontools/smartd-runner

To

DEVICESCAN -m dns <at> email.pt –T -M exec /usr/share/smartmontools/smartd-runner

 

 

 

Apr  3 14:30:56 cisne smartd[2457]: smartd version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Apr  3 14:30:56 cisne smartd[2457]: Home page is http://smartmontools.sourceforge.net/

Apr  3 14:30:56 cisne smartd[2457]: Opened configuration file /etc/smartd.conf

Apr  3 14:30:56 cisne smartd[2457]: Drive: DEVICESCAN, implied '-a' Directive on line 22 of file /etc/smartd.conf

Apr  3 14:30:56 cisne smartd[2457]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices

Apr  3 14:30:56 cisne smartd[2457]: Device: /dev/hda, opened

Apr  3 14:30:56 cisne smartd[2457]: Device: /dev/hda, found in smartd database.

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hda, can't monitor Current Pending Sector count - no Attribute 197

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hda, can't monitor Offline Uncorrectable Sector count  - no Attribute 198

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hda, appears to lack SMART Self-Test log; disabling -l selftest (override with -T permissive Directive)

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hda, appears to lack SMART Error log; disabling -l error (override with -T permissive Directive)

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hda, is SMART capable. Adding to "monitor" list.

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, opened

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, found in smartd database.

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, can't monitor Current Pending Sector count - no Attribute 197

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, can't monitor Offline Uncorrectable Sector count  - no Attribute 198

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, appears to lack SMART Self-Test log; disabling -l selftest (override with -T permissive Directive)

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, appears to lack SMART Error log; disabling -l error (override with -T permissive Directive)

Apr  3 14:30:57 cisne smartd[2457]: Device: /dev/hdc, is SMART capable. Adding to "monitor" list.

Apr  3 14:30:57 cisne smartd[2457]: Monitoring 2 ATA and 0 SCSI devices

Apr  3 14:30:58 cisne smartd[2507]: smartd has fork()ed into background mode. New PID=2507.

Apr  3 14:30:58 cisne smartd[2507]: file /var/run/smartd.pid written containing PID 2507

------------------------------------------------------------------------------
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Andriy Gapon | 2 Apr 2009 15:43
Picon

issue with self-test on newer seagate disks (ahci related?)


I have the same problem as described here:
http://thread.gmane.org/gmane.linux.utilities.smartmontools/5995/focus=6000
and here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=503439

Does anybody have any guesses?

I use FreeBSD, my system is ICH9-based (DG33TL), the disk is ST3500410AS, AHCI
mode is configured in BIOS.

Full smartctl -a output after some attempts to run various self-tests:

smartctl version 5.38 [amd64-portbld-freebsd7.1] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3500410AS
Serial Number:    5VM0NB43
Firmware Version: CC34
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Apr  2 16:32:02 2009 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                 ( 600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  94) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   108   099   006    Pre-fail  Always       -
     18711630
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -
     0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -
     2
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -
     0
  7 Seek_Error_Rate         0x000f   068   060   030    Pre-fail  Always       -
     6591276
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
     138
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -
     0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -
     2
183 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -
     0
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -
     0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -
     0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -
     0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -
     0
190 Airflow_Temperature_Cel 0x0022   059   055   045    Old_age   Always       -
     41 (Lifetime Min/Max 37/45)
194 Temperature_Celsius     0x0022   041   045   000    Old_age   Always       -
     41 (0 29 0 0)
195 Hardware_ECC_Recovered  0x001a   048   032   000    Old_age   Always       -
     18711630
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
     0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -
     0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -
     0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -
     157522220548242
241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -
     3431356807
242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -
     2746629257

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)
LBA_of_first_error
# 1  Conveyance offline  Self-test routine in progress 90%       138         -
# 2  Short offline       Aborted by host               90%       132         -
# 3  Short offline       Aborted by host               90%       132         -
# 4  Extended offline    Aborted by host               80%       116         -
# 5  Extended offline    Aborted by host               90%        46         -
# 6  Extended offline    Aborted by host               90%        39         -
# 7  Extended offline    Aborted by host               60%        37         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

--

-- 
Andriy Gapon

------------------------------------------------------------------------------
Christian Franke | 6 Apr 2009 12:27
Picon
Favicon

Re: smartctl, reallocated sector count question

David Mathog wrote:
> 
> Summary so far:  some old Seagate ST340016A disks were found to have
> nonzero 'Offline uncorrectable' and 'Current_Pending_Sector' counts
> which could not be reset to zero by writing to every block on the
> disk.
> 

Just for Info: Current CVS version of smartd provides a workaround for
this issue:

If '-C 197+ -U 198+' is specified in smartd.conf, a warning is only
issued if 'Current_Pending_Sector' or 'Offline uncorrectable' raw value
increase. If the new persistence feature ('-s' option) is used, then
this also works across boot cycles.

I will also add '-v' options which will allow to enable this by the
drive database.

Cheers,
    Christian

------------------------------------------------------------------------------
Christian Franke | 6 Apr 2009 18:32
Picon
Favicon

Re: Inquiry: USB Devices which support SMART Protocol -- Recommendations?

Stefan Nowak wrote:
> Please tell me your opinion on my assumption:
> If chipset X in product A from producer 1 is supported, then product B  
> from producer 2, which also uses chipset X, very likely also is  
> supported. Right?
>
>   

Yes.

There might be a small risk that this is not true due to vendor specific 
changes to the firmware of the USB bridge.

> DeLOCK 61391 IDE/SATA to USB 2.0 Adapter
>
> Product:
> http://www.delock.de/produkte/suche/Converter_USB_20_zu_SATA_SLASH_IDE_61391.html?setLanguage=EN
> Driver:
> http://www.delock.de/download/driver/61391/A/202
> Chipset Guess: JMicron JM20338
> ...
> JM="JMicron Tech."
> USB\VID_152D&PID_2338.DeviceDesc="JMicron SATA-USB Combo Device"
> UMSS\DISK.DeviceDesc="USB Mass Storage Device"
>
>
> Sharkoon DriveLink
>
> Product:
> http://www.sharkoon.de/html/produkte/externe_gehaeuse/drive_link/index_en.html
> Manual:
> http://www.sharkoon.de/html/support/bedienungsanleitungen/pdf/drivelink_manual_english.pdf
> Driver:
> http://www.sharkoon.de/html/support/treiber/treiber/drivelink_win98_driver.zip
> Chipset Guess: JMicron JM20338
>   

The JM20338 is a USB+SATA->PATA chip. Both products likely have a 
JMicron JM20337 (USB->PATA+SATA). I presume that this is the same chip 
with different firmware.

> ...
> MSFT="JMicron"
> MfgName="JMicron"
> DeviceDesc="JM20338 SATA, USB Combo"
> jmus\DISK.DeviceDesc="USB Mass Storage Device"
> AUTORUN="Software\Microsoft\Windows\CurrentVersion\Run"
>
>
>   

Both products should work with current CVS version of smartmontools on 
all platforms which support SCSI pass-through (this does not include 
MacOS X).
Auto-detection of these devices should also work on Linux and Windows.

If you have any test result for the Wiki, please tell me.

Cheers,
  Christian

------------------------------------------------------------------------------
Deepak SysAdm | 7 Apr 2009 17:21
Picon

SMART support for Adaptec JBOD

Hi guys,

I've smartd configured for the drives in the RAID unit on Adaptec Controller (using /dev/sg device) and they're working fine. However, I can't find out the SMART status of one drive on the controller which is not part of a RAID unit, but setup as JBOD. The error message is given below:

smartctl -a -d scsi /dev/sg1
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: Adaptec  JBOD1            Version: V1.0
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

-----

whereas it's working fine for /dev/sg2 (1st drive on the RAID1 unit)

Device: SEAGATE  ST373455SS       Version: 0002
Serial number: 3LQ19RCB000098015TFW
Device type: disk
Transport protocol: SAS
Local Time is: Tue Apr  7 10:10:21 2009 CDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

------------------------

I've already tried the '-T permissive' option, but that didn't help. Is there any way to get the JBOD drive working with smartd? Thanks in advance!

Regards,
Deepak

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

list question

I have 2 questions:

1st question
For easier reading of the smartd.conf file, is it acceptable to specify tests for disk separately as below?
===================
# Short Test Every Night
/dev/sda -a -d sat -o on -S on -s (S/../.././21) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner
/dev/sdb -a -d sat -o on -S on -s (S/../.././23) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner
/dev/sdc -a -d sat -o on -S on -s (S/../.././21) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner
/dev/sdd -a -d sat -o on -S on -s (S/../.././23) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner

#Long Test, Wednesday Morning
/dev/sda -a -d sat -o on -S on -s (L/../../3/02) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner
/dev/sda -a -d sat -o on -S on -s (L/../../3/04) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner
/dev/sda -a -d sat -o on -S on -s (L/../../3/02) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner
/dev/sda -a -d sat -o on -S on -s (L/../../3/04) -m gianopou -M
exec /usr/share/smartmontools/smartd-runner

# /dev/sda -d sat -H -t -l selftest -m gianopou
/dev/sda -d sat -H -C 0 -U 0 -t -l selftest -m gianopou
/dev/sdb -d sat -H -C 0 -U 0 -t -l selftest -m gianopou
/dev/sdc -d sat -H -C 0 -U 0 -t -l selftest -m gianopou
/dev/sdd -d sat -H -C 0 -U 0 -t -l selftest -m gianopou
=====================================================================
2nd question

Are the messages below from logwatch indicators of impending problems with these hard drives? These are the Seagate 500g sata drives that were delivered to me with the SD15 firmware ( which has since been upgraded to the SD1A firmware)

===================
 /dev/sda :
    Prefailure: Raw_Read_Error_Rate (1) changed to
      118, 119, 102, 111, 113, 115, 116, 117, 118, 119, 120, 109,
      113, 115, 117, 118, 119, 120, 108, 112, 114, 116, 117, 118,
      119, 120, 111, 114, 115, 117, 118, 119,
    Usage: Airflow_Temperature_Cel (190) changed to
      65, 64, 65, 66, 65,
    Usage: Hardware_ECC_Recovered (195) changed to
      52, 53, 52, 51, 50, 51, 50, 51, 50, 49, 50, 51,
      50, 49, 51, 49, 50,
    Usage: Temperature_Celsius (194) changed to
      35, 36, 35, 34, 35,
 
 /dev/sdb :
    Prefailure: Raw_Read_Error_Rate (1) changed to
      118, 119, 105, 111, 114, 115, 117, 118, 119, 101, 111, 114,
      115, 117, 118, 119, 120, 109, 113, 115, 117, 118, 119, 105,
      111, 114, 115, 117, 118,
    Prefailure: Seek_Error_Rate (7) changed to
      75,
    Usage: Airflow_Temperature_Cel (190) changed to
      67, 68, 68, 67, 68, 67, 68, 69, 68,
    Usage: Hardware_ECC_Recovered (195) changed to
      53, 54, 56, 55, 56, 55, 56, 58, 57, 55, 56, 55,
      56, 55, 54, 53, 54, 56, 57, 58, 59, 58, 59, 58,
    Usage: Temperature_Celsius (194) changed to
      33, 32, 32, 33, 32, 33, 32, 31, 32,
 
 /dev/sdc :
    Prefailure: Raw_Read_Error_Rate (1) changed to
      113, 115, 117, 118, 119, 110, 114, 116, 117, 118, 119, 109,
      114, 116, 117, 118, 119, 111, 114, 116, 117, 118, 119, 105,
    Usage: Airflow_Temperature_Cel (190) changed to
      66, 67, 68, 67, 68, 67, 68, 67, 68, 67, 66, 65,
      66, 67,
    Usage: Hardware_ECC_Recovered (195) changed to
      34, 33, 29, 31, 33, 34, 35, 54, 53, 52, 53, 60,
      54, 53, 52, 48, 45, 43, 42, 41, 40, 39, 38, 37,
      36, 35,
    Usage: Temperature_Celsius (194) changed to
      34, 33, 32, 33, 32, 33, 32, 33, 32, 33, 34, 35,
      34, 33,
 
 /dev/sdd :
    Prefailure: Raw_Read_Error_Rate (1) changed to
      113, 115, 117, 118, 119, 110, 114, 116, 117, 118, 119, 110,
      114, 116, 117, 118, 119, 99, 111, 114, 116, 117, 118, 119,
      105,
    Usage: Airflow_Temperature_Cel (190) changed to
      69, 71, 70, 71, 70, 69, 68, 69, 70,
    Usage: Hardware_ECC_Recovered (195) changed to
      20, 22, 24, 25, 26, 48, 47, 46, 45, 44, 45, 46,
      51, 49, 48, 49, 48,
    Usage: Temperature_Celsius (194) changed to
      31, 29, 30, 29, 30, 29, 30, 31, 32, 31, 30,


Thanks in Advance

john

Linux System Administrator
john.gianopoulos <at> hs.utc.com

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Jorge Bastos | 8 Apr 2009 00:08
Picon

Weekly test

Hi guys,

 

How can I tell smartd, to perform a test and report errors, bad blocks etc, weekly?

 

Thanks.

Jorge,

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Gabriele Pohl | 9 Apr 2009 15:10
Picon

Re: Weekly test

Hi Jorge,

"Jorge Bastos" wrote:
> Hi guys,

although you didn't address me as female subscriber
of this list, I will answer ;)

> How can I tell smartd, to perform a test and report errors, bad 
> blocks etc, weekly?

Have a look at the CONFIGURATION FILE DIRECTIVES
of smartd.conf
http://smartmontools.sourceforge.net/man/smartd.8.html#lbAJ 
-------------
-s REGEXP
    Run Self-Tests or Offline Immediate Tests, at scheduled times. 

eg:
To schedule a long Self-Test between 4-5am every Sunday morning, use:
 -s L/../../7/04
-------------

HTH,
Gabriele
__________________________________________________________________________
Verschicken Sie SMS direkt vom Postfach aus - in alle deutschen und viele 
ausländische Netze zum gleichen Preis! 
https://produkte.web.de/webde_sms/sms

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com

Gmane