David Flanagan | 2 Jul 00:41 2003

support for Seagate Barracuda 7200.7 series?


Hi,

I've got a new Seagate Barracuda 7200.7 40Gb hard drive, a ST340014A.
Its making a soft but annoying intermittent buzz, which I think might be
related to Seagate's implemenation of SMART.

I've found that any hard-drive activity (touch foo; sync) will stop the
buzz for exactly 40 seconds (unless there is more disk activity in the
meantime).  Then it starts again.

The only thing I've found that can make the buzz start on cue is to run
a self test:

      smartctl -t long /dev/hda # -t long starts buzzing; -t offline doesn't

In this case, disk activity doesn't stop the buzzing, but stopping the
test does:

      touch foo; sync;         # keeps on buzzing
      smartctl -X /dev/hda     # stops the buzzing

So, obviously, my hypothesis is that my drive is running an offline test
on itself whenever the drive is idle for 40 seconds.

So I try this:

      smartctl -o off -s off /dev/hda

Unfortunately, it doens't work.  The buzz persists.
(Continue reading)

Sander Smeenk | 3 Jul 23:42 2003
Picon

Pre-Fail? - Sez who ;)

Hi,

There's a few things that make me wonder how 'reliable' the results
returned by smartctl -a are. 

For example, what determines if an attribute is 'Pre-fail' or 'Old_age' ?
What does this mean? Is it just a counter that, when crossing a
boundary, changes the text from New(?) to Old_age to Pre-fail? Is there
some kind of logic?

Why does this particular disk need 2222 seconds to complete offline
datacollection while another (larger!) disk only needs 241? Quite the
difference in time, i'd say. Is that only a result of the 241secs disk
being a ATA133 & 7200rpm Maxtor?

Or how reliable is this result for example:
| 9 Power_On_Hours 0x0012 255 001 000 Old_age Always - 12098

12098 hours = 504 days = 1.3 yrs of 'power on disk spinning' time ?

And with that number in mind, how could I determine how long ago this
error ocurred:

| Timestamp is seconds since the previous disk power-on.
| Note: timestamp "wraps" after 2^32 msec = 49.710 days.
| Error 19 occurred at disk power-on lifetime: 7099 hours

Can I assume the disk had errors ocurring at approximately 80% of the
first year of activity (7099/24=295*(100/365)=80.82%), and thereafter no
more errors occured? The disk is approximately two to three years old...
(Continue reading)

Bruce Allen | 6 Jul 16:03 2003
Picon

Re: Pre-Fail? - Sez who ;)

Hi Sander,

> There's a few things that make me wonder how 'reliable' the results
> returned by smartctl -a are. 

Please remember that smartctl -a is not doing anything other than
reporting the values returned by the disks firmware.

> For example, what determines if an attribute is 'Pre-fail' or 'Old_age' ?

This is determined by one bit of the 12-byte Attribute data structure. Its
meaning is defined in SFF-8035i revision 2, and made it into the ATA-3 and
perhaps -4 spec.  These are posted on the smartmontools web site under
"REFERENCES".  Please have a look.  Note that if the firmware on your disk
is NOT backwards compatible with the SFF spec (most are) then this bits
may be meaningless.

> What does this mean? Is it just a counter that, when crossing a
> boundary, changes the text from New(?) to Old_age to Pre-fail? Is
> there some kind of logic?

An Attribute is either prefail or oldage.

> Why does this particular disk need 2222 seconds to complete offline
> datacollection while another (larger!) disk only needs 241?  Quite the
> difference in time, i'd say. Is that only a result of the 241secs disk
> being a ATA133 & 7200rpm Maxtor?

Good question.  But you'll have to address it to the disk manufacturer --
this is determined by the firmware.
(Continue reading)

Sander Smeenk | 6 Jul 18:27 2003
Picon

Re: Pre-Fail? - Sez who ;)

Quoting Bruce Allen (ballen <at> gravity.phys.uwm.edu):

> > There's a few things that make me wonder how 'reliable' the results
> > returned by smartctl -a are. 
> Please remember that smartctl -a is not doing anything other than
> reporting the values returned by the disks firmware.

That is true. And in that case, I doubt the firmware's correctness ;)

> > This all is about a disk in my server which, during normal operation, 
> > doesn't show any weirdnesses at all.
> > 7099 power-on lifetime, but I never noticed anything . . . . . yet.
> The disk is probably fine. The sign of trouble is if you start
> accumulating such errors in large numbers now.

I think this was what I wanted to hear ;)

> Your disk looks fine -- I suggest you do a long self test smartctl -t long
> perhaps once per week, and run smartd to get warnings of any new errors or
> failures.

I already enabled smartd with the -m mail-report option. Now there's a
crontab entry which starts smartctl -t long /dev/hda every week.

Thanks for your reply and your information. I shall read in to this
matter and remember not to start blurting questions when I should read
documentation ;)

Sander.
--

-- 
(Continue reading)

James William Morris | 8 Jul 01:18 2003
Picon

Warning: ... -F switch may be needed

Bruce,

  Buried in this dump is a warning to contact you about my hdd, so here it 
is!

James Morris

~(sirromseventyfive)~

smartctl version 5.1-11 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG SV0322A
Serial Number:    dW0901190514e6
Firmware Version: JK200-35
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   4
ATA Standard is:  ATA/ATAPI-4 X3T13 1153D revision 7
Local Time is:    Mon Jul  7 00:01:30 2003 BST

==> WARNING: Contact developers; may need -F enabled.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
(Continue reading)

Bruce Allen | 8 Jul 11:09 2003
Picon

Re: Warning: ... -F switch may be needed

Hi James,

Thanks a lot for the output.  It looks as if this disk doesn't need any
firmware corrections applied in the code, because it doens't have either a
self-test or an error log.

It also doesn't have a power on lifetime or temperature attribute -- these
were also problematic.

I'll add this to the drive database, so you won't get the warning in the
future.

[Note: I'd suggest upgrading to smartmontools 5.1-14, and enabling the
automatic offline testing with -o on and automatic Attribute autosave with
-S on.]

Cheers,
	Bruce

On Tue, 8 Jul 2003, James William Morris wrote:

> Bruce,
> 
>   Buried in this dump is a warning to contact you about my hdd, so here it 
> is!
> 
> James Morris
> 
> ~(sirromseventyfive)~
> 
(Continue reading)

Fabrizio Di Meo | 9 Jul 10:18 2003
Picon

Smartd doesn't execute scripts...

Hi,
     I'm trying to use the -M directive of smartmontools, but nothing happens (also in case of pre-fail or usage).
 
Below there's the script smartd.conf:
 
 
#/etc/smartd.conf
 
/dev/hda -H -o on -f -l error -l selftest -m fabriziodimeo <at> yahoo.it,root <at> localhost -t \
-R 1 -R 3 -R 4 -R 5 -R 7 -R 11 -R 13 -M exec /usr/test/run
 
# Monitor SMART status, ATA Error Log, Self-test log, and track
# changes in all attributes except for attribute 194 (-I 194)
/dev/hdd -H -o on -f -l error -l selftest -m fabriziodimeo <at> yahoo.it,root <at> localhost -t \ 
-R 1 -R 3 -R 4 -R 5 -R 7 -R 11 -R 13 -R 194 -M exec /usr/test/run
 
 
during a temperature changing (and using smartd -d -i 30) I get these messages:
 
obiwan:/usr/test # smartd -d -i 10
smartd version 5.1-9 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Using configuration file /etc/smartd.conf
Device: /dev/hda, opened
Device: /dev/hda, enabled SMART Automatic Offline Testing.
Device: /dev/hda, is SMART capable. Adding to "monitor" list.
Device: /dev/hdd, opened
Device: /dev/hdd, enabled SMART Automatic Offline Testing.
Device: /dev/hdd, is SMART capable. Adding to "monitor" list.
Started monitoring 2 ATA and 0 SCSI devices
Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 253 [Raw 44] to 253 [Raw 42]
Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 253 [Raw 42] to 253 [Raw 44]
but the script "run" isn't executed.
 
Is there something wrong?
 
Thank you,
 Fabrizio
 


Yahoo! Mail: 6MB di spazio gratuito, 30MB per i tuoi allegati, l'antivirus, il filtro Anti-spam
Bruce Allen | 9 Jul 10:25 2003
Picon

Re: Smartd doesn't execute scripts...

Ciao Fabrizio,

The script is only run if a problem is detected.  It's NOT run when an
Attribute value changes, since this is generally not a sign of a problem.  
If an Attribute fails (meaning that it's normalized value is less than or
equal to the threshold value) THEN the script will be run.  But it WON'T
be run if the Attribute value simply changes, but does not fail.

To test that your script runs, please add:
  -M test
to your list of Directives.

Please let us know if this works OK.

Also, you might want to update your copy of smartmontools to the most
recent 5.1-14 release.

A presto,
	Bruce

On Wed, 9 Jul 2003, Fabrizio Di Meo wrote:

> Hi,
>      I'm trying to use the -M directive of smartmontools, but nothing happens (also in case of pre-fail or usage).
>  
> Below there's the script smartd.conf:
>  
>  
> #/etc/smartd.conf
>  
> /dev/hda -H -o on -f -l error -l selftest -m fabriziodimeo <at> yahoo.it,root <at> localhost -t \
> -R 1 -R 3 -R 4 -R 5 -R 7 -R 11 -R 13 -M exec /usr/test/run
>  
> # Monitor SMART status, ATA Error Log, Self-test log, and track
> # changes in all attributes except for attribute 194 (-I 194)
> 
> /dev/hdd -H -o on -f -l error -l selftest -m fabriziodimeo <at> yahoo.it,root <at> localhost -t \  
> -R 1 -R 3 -R 4 -R 5 -R 7 -R 11 -R 13 -R 194 -M exec /usr/test/run
>  
>  
> during a temperature changing (and using smartd -d -i 30) I get these messages:
>  
> obiwan:/usr/test # smartd -d -i 10
> smartd version 5.1-9 Copyright (C) 2002-3 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> Using configuration file /etc/smartd.conf
> Device: /dev/hda, opened
> Device: /dev/hda, enabled SMART Automatic Offline Testing.
> Device: /dev/hda, is SMART capable. Adding to "monitor" list.
> Device: /dev/hdd, opened
> Device: /dev/hdd, enabled SMART Automatic Offline Testing.
> Device: /dev/hdd, is SMART capable. Adding to "monitor" list.
> Started monitoring 2 ATA and 0 SCSI devices
> Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 253 [Raw 44] to 253 [Raw 42]
> Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 253 [Raw 42] to 253 [Raw 44]
> 
> but the script "run" isn't executed.
>  
> Is there something wrong?
>  
> Thank you,
>  Fabrizio
>  
> 
> 
> 
> ---------------------------------
> Yahoo! Mail: 6MB di spazio gratuito, 30MB per i tuoi allegati, l'antivirus, il filtro Anti-spam

-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps
Fabrizio Di Meo | 9 Jul 11:42 2003
Picon

Re: Smartd doesn't execute scripts...

Hi Bruce,
              thank you :o)
You've solved my doubts.
I would to receive an alarm/email if ever my hard disk worked over some temperature thresholds, but I can always use awk to get this.
 
The -M test directive works fine to me.
 
Unfortunately I lost an hard disk last week because a not detected failure by Activesmart (under windows) and I wouldn't have a such problem  under linux (this hosts a raid 1 configuration).
 
Even if I read all the man pages what I don't understand yet is how smartctl and smartd interact among them.
 
 
Thank you Bruce,
 Ciao
 
Fabrizio
 
 


Bruce Allen <ballen <at> gravity.phys.uwm.edu> wrote:
Ciao Fabrizio,

The script is only run if a problem is detected. It's NOT run when an
Attribute value changes, since this is generally not a sign of a problem.
If an Attribute fails (meaning that it's normalized value is less than or
equal to the threshold value) THEN the script will be run. But it WON'T
be run if the Attribute value simply changes, but does not fail.

To test that your script runs, please add:
-M test
to your list of Directives.

Please let us know if this works OK.

Also, you might want to update your copy of smartmontools to the most
recent 5.1-14 release.

A presto,
Bruce

On Wed, 9 Jul 2003, Fabrizio Di Meo wrote:

> Hi,
> I'm trying to use the -M directive of smartmontools, but nothing happens (also in case of pre-fail or usage).
>
> Below there's the script smartd.conf:
>
>
> #/etc/smartd.conf
>
> /dev/hda -H -o on -f -l error -l selftest -m fabriziodimeo <at> yahoo.it,root <at> localhost -t \
> -R 1 -R 3 -R 4 -R 5 -R 7 -R 11 -R 13 -M exec /usr/test/run
>
> # Monitor SMART status, ATA Error Log, Self-test log, and track
> # changes in all attributes except for attribute 194 (-I 194)
>
> /dev/hdd -H -o on -f -l error -l selftest -m fabriziodimeo <at> yahoo.it,root <at> localhost -t \
> -R 1 -R 3 -R 4 -R 5 -R 7 -R 11 -R 13 -R 194 -M exec /usr/test/run
>
>
> during a temperature changing (and using smartd -d -i 30) I get these messages:
>
> obiwan:/usr/test # smartd -d -i 10
> smartd version 5.1-9 Copyright (C) 2002-3 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> Using configuration file /etc/smartd.conf
> Device: /dev/hda, opened
> Device: /dev/hda, enabled SMART Automatic Offline Testing.> Device: /dev/hda, is SMART capable. Adding to "monitor" list.
> Device: /dev/hdd, opened
> Device: /dev/hdd, enabled SMART Automatic Offline Testing.
> Device: /dev/hdd, is SMART capable. Adding to "monitor" list.
> Started monitoring 2 ATA and 0 SCSI devices
> Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 253 [Raw 44] to 253 [Raw 42]
> Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 253 [Raw 42] to 253 [Raw 44]
>
> but the script "run" isn't executed.
>
> Is there something wrong?
>
> Thank you,
> Fabrizio
>
>
>
>
> ---------------------------------
> Yahoo! Mail: 6MB di spazio gratuito, 30MB per i tuoi allegati, l'antivirus, il filtro Anti-spam


Yahoo! Mail: 6MB di spaz io gratuito, 30MB per i tuoi allegati, l'antivirus, il filtro Anti-spam
Bruce Allen | 9 Jul 16:36 2003
Picon

Re: Smartd doesn't execute scripts...

>               thank you :o) You've solved my doubts. 

Un piacere!

> I would to receive an alarm/email if ever my hard disk worked over
> some temperature thresholds, but I can always use awk to get this.

Hmmm, not a bad idea.  I haven't got time to do this now, but I might put
it onto the TODO list.

> The -M test directive works fine to me.

Good.

> Unfortunately I lost an hard disk last week because a not detected
> failure by Activesmart (under windows) and I wouldn't have a such
> problem under linux (this hosts a raid 1 configuration).
>
> Even if I read all the man pages what I don't understand yet is how
> smartctl and smartd interact among them.

smartctl and smartd don't interact much.

smartctl:
	opens device
        sends command(s)
        closes device

smartd:
        opens device
        sends command(s)
        closes device 
        sleeps 30 minutes
        repeats above steps

Provided that both smartctl and smartd don't try and send commands at the
same time, there is absolutely no problem.  Even if they both try and send
commands at the same time, the disk will execute the correct command for
the process, and so they shouldn't interfere.

Cheers,
	Bruce

-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps

Gmane