Jens Schweikhardt | 1 May 2004 10:56
Favicon
Gravatar

bugs in smartd.sh script

Bruce,

I could not find any info in the man page how and where report bugs to.
The sourceforge site also doesn't appear to have a bug database, or I
had difficulties spotting it, so here goes my report.

There are two problems with the smartd.sh.sample script as shipped for
FreeBSD, $Id: smartd.initd.in,v 1.21 2004/03/05 14:55:14 ballen4705 Exp $

- /usr/local/sbin is not in the PATH when the system startup scripts
  are run, so we need to use the full path name /usr/local/sbin/smartd.

- The echo -n "smartd " should have the space not appended, but prepended.
  I think this is true not only for FreeBSD, but other OSs as well.

- The script has a lot of useless whitespace at end-of-lines. If you happen
  to use vim6, put

  match Todo /\s\+$/

  in your .vimrc to make it jump right in your face :-) Emacs has surely
  something similar.

Regards,

	Jens
--

-- 
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)

(Continue reading)

Fredrik Persson | 2 May 2004 16:37

'smartctl -t long /dev/hdh' killed my Samsung SV1604N

Hello!

I'm new to this list, but I've browsed the archive for my particular problem 
before posting. I've got a Samsung SV1604N (160GB, 5400rpm) that I ran the 
long test on. (Like so: 'smartctl -t long', perhaps I should've included '-F 
samsung'?)

It completely KILLED the HD!

After about an hour, this started to turn up when doing 'dmesg':

May  2 13:23:30 rostig kernel: hdh: irq timeout: status=0xd0 { Busy }
May  2 13:23:31 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
May  2 13:23:31 rostig kernel: hdh: drive not ready for command
May  2 13:23:32 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
May  2 13:23:32 rostig kernel: hdh: drive not ready for command
May  2 13:23:33 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
May  2 13:23:33 rostig kernel: hdh: drive not ready for command

Not good. I've also configured SMART to send me emails. I received four of 
those, within a four-second period starting at 13:23:30.

First:

The following warning/error was logged by the smartd daemon:
Device: /dev/hdh, not capable of SMART self-check

Second:

The following warning/error was logged by the smartd daemon:
(Continue reading)

Eduard Martinescu | 2 May 2004 21:34
Picon

Re: bugs in smartd.sh script

Jens,

I have some fixes ready for the next release of Smartmontools, but they are not in the current release that is part of the FreeBSD Ports yet. 

Ed

On Sat, 2004-05-01 at 04:56, Jens Schweikhardt wrote:
Bruce, I could not find any info in the man page how and where report bugs to. The sourceforge site also doesn't appear to have a bug database, or I had difficulties spotting it, so here goes my report. There are two problems with the smartd.sh.sample script as shipped for FreeBSD, $Id: smartd.initd.in,v 1.21 2004/03/05 14:55:14 ballen4705 Exp $ - /usr/local/sbin is not in the PATH when the system startup scripts are run, so we need to use the full path name /usr/local/sbin/smartd. - The echo -n "smartd " should have the space not appended, but prepended. I think this is true not only for FreeBSD, but other OSs as well. - The script has a lot of useless whitespace at end-of-lines. If you happen to use vim6, put match Todo /\s\+$/ in your .vimrc to make it jump right in your face :-) Emacs has surely something similar. Regards, Jens
--
Eduard Martinescu <martines <at> rochester.rr.com>
Bruce Allen | 3 May 2004 17:26
Picon
Favicon

Re: 'smartctl -t long /dev/hdh' killed my Samsung SV1604N

Hi Fredrik,

On Sun, 2 May 2004, Fredrik Persson wrote:

> I'm new to this list, but I've browsed the archive for my particular
> problem before posting. I've got a Samsung SV1604N (160GB, 5400rpm)
> that I ran the long test on. (Like so: 'smartctl -t long', perhaps I
> should've included '-F samsung'?)
> 
> It completely KILLED the HD!

I'm sorry to hear this.  If it's any consolation, the disk would have died
anyway -- the long self-test was simply the little bit of extra load that
pushed the disk past its failure point.

Was there any prior sign that the disk was 'in trouble'?

The long self-test read scans the entire disk surface.  If the disk has an
electronic or mechanical problem, then this extended read scan can provoke
failure.  (This type of failure is also commonly seen when people backup
disks.  Because the load of reading all the data from the disk is a heavy
one, it often leads to catastrophic failure in the middle of the backup.  
This is why you should always have a PAIR of backups, an over-write the
older of the two, but preserve the newer of the two.)

Before you give up on the disk, double check the power and signal cabling
to be sure that nothing has worked loose.  Additional comments below.

> After about an hour, this started to turn up when doing 'dmesg':
> 
> May  2 13:23:30 rostig kernel: hdh: irq timeout: status=0xd0 { Busy }
> May  2 13:23:31 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
> May  2 13:23:31 rostig kernel: hdh: drive not ready for command
> May  2 13:23:32 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
> May  2 13:23:32 rostig kernel: hdh: drive not ready for command
> May  2 13:23:33 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
> May  2 13:23:33 rostig kernel: hdh: drive not ready for command

The drive simply stopped responding to commands.

> Not good. I've also configured SMART to send me emails. I received four of 
> those, within a four-second period starting at 13:23:30.
> 
> First:
> 
> The following warning/error was logged by the smartd daemon:
> Device: /dev/hdh, not capable of SMART self-check
> 
> Second:
> 
> The following warning/error was logged by the smartd daemon:
> Device: /dev/hdh, failed to read SMART Attribute Data
> 
> Third:
> 
> The following warning/error was logged by the smartd daemon:
> Device: /dev/hdh, Read SMART Error Log Failed
> 
> Fourth:
> 
> The following warning/error was logged by the smartd daemon:
> Device: /dev/hdh, Read SMART Self Test Log Failed

These four messages are because the disk wasn't reachable any more.

> After that, 'smartctl -a /dev/hdh/' claimed that /dev/hdh wasn't able to do 
> SMART-communication. I then rebooted the machine. Now, the drive wont even 
> show up. 'dmesg' shows this:
> 
> hda: Conner Peripherals 850MB - CFS850A, ATA DISK drive
> hdc: SAMSUNG SV1204H, ATA DISK drive
> hde: WDC WD1200AB-00CBA1, ATA DISK drive
> hdf: WDC WD1200AB-00CBA1, ATA DISK drive
> hdg: Maxtor 6Y120L0, ATA DISK drive
> 
> No hdh anywhere.

As I said, double check the power and signal cabling. But they are
probably OK -- this looks like a straighfoward electronic (not
mechanical) drive failure.

> Disaster. What can possibly have happened here? The HD was fairly new
> (just a few months old) has NOT been running 24/7 or anything like
> that although it's been running for 5-8 hours every day.

Really there are just three possibilities.  (1) The additional load of a
self-test provoked catastrophic failure (would have happened anyway, when
the disk was under load in the future) (2) sudden electrical failure
unrelated to self-test (eg, voltage spike killed a chip in the disk) or
(3) cabling problems (do double check to eliminate this possiblity).

> Any help or hints about this problem would be greatly appreciated,

If the disk has failed (and its just a few months old) it should still be
under warranty.  Hopefully you can re-create the data that was on it.

Cheers,	
    Bruce

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
Bruce Allen | 3 May 2004 17:47
Picon
Favicon

Re: bugs in smartd.sh script

Hi Jens,

> I could not find any info in the man page how and where report bugs to.

Doesn't your man page say this?

HOME PAGE FOR SMARTMONTOOLS:
       Please see the following web site for updates,  further
       documentation, bug reports and patches:
       http://smartmontools.sourceforge.net/

> The sourceforge site also doesn't appear to have a bug database, or I
> had difficulties spotting it, so here goes my report.

In the FAQ section, there is an entry that says:
  What do I do if I have problems, or need support?

> There are two problems with the smartd.sh.sample script as shipped for
> FreeBSD, $Id: smartd.initd.in,v 1.21 2004/03/05 14:55:14 ballen4705 Exp $
> 
> - /usr/local/sbin is not in the PATH when the system startup scripts
>   are run, so we need to use the full path name /usr/local/sbin/smartd.

I think this has already been fixed in CVS.

> - The echo -n "smartd " should have the space not appended, but prepended.
>   I think this is true not only for FreeBSD, but other OSs as well.

Could you please explain this further, and comment on this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=107454

> - The script has a lot of useless whitespace at end-of-lines. If you happen
>   to use vim6, put
> 
>   match Todo /\s\+$/
> 
>   in your .vimrc to make it jump right in your face :-) Emacs has surely
>   something similar.

I'll clean these up -- thanks for pointing it out.

Cheers,
	Bruce

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
Fredrik Persson | 3 May 2004 22:51

Re: 'smartctl -t long /dev/hdh' killed my Samsung SV1604N

Hello, and thanks for your quick reply.

Short: it came back to life! How? I shut it down in the evening and started it 
again about 12 hours later, it there the disk was, alive and kicking. So the 
case went like this: booted the machine, ran the long self test, got the 
errors I described below, rebooted the machine to see if that got the drive 
working. It didn't, it got worse, the drive didn't exist at all 
(no /dev/hdh). Turned it off, waited 12 hours, turned it on and everything 
was back to normal.

Before you dismiss me as a nutcase, please read the comments below. However, 
what I'd *really* like to know is this: would '-F samsung' have made any 
difference when I ran the long selftest?

On Monday 03 May 2004 17.26, Bruce Allen wrote:
> Hi Fredrik,
>
> On Sun, 2 May 2004, Fredrik Persson wrote:
> > I'm new to this list, but I've browsed the archive for my particular
> > problem before posting. I've got a Samsung SV1604N (160GB, 5400rpm)
> > that I ran the long test on. (Like so: 'smartctl -t long', perhaps I
> > should've included '-F samsung'?)
> >
> > It completely KILLED the HD!
>
> I'm sorry to hear this.  If it's any consolation, the disk would have died
> anyway -- the long self-test was simply the little bit of extra load that
> pushed the disk past its failure point.
>
> Was there any prior sign that the disk was 'in trouble'?

Maybe. This is what I get from 'smartctl -a -F samsung /dev/hdh': (sorry about 
the linebreaks, I hope it's still readable.)

----------------------------------------------

smartctl version 5.1-18 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG SV1604N
Serial Number:    S01FJ10X102037
Firmware Version: TR100-24
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Mon May  3 22:32:06 2004 CEST

==> WARNING: Contact developers; may need -F samsung enabled.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Off-line data collection status: (0x00) Offline data collection activity was
                                        never started.
                                        Auto Off-line Data Collection: 
Disabled.
Self-test execution status:      (  39) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete off-line
data collection:                 (7200) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Automatic timer ON/OFF support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 120) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   051    Pre-fail  Always       
-       0
  3 Spin_Up_Time            0x0007   073   070   000    Pre-fail  Always       
-       4864
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       
-       171
  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       
-       0
  7 Seek_Error_Rate         0x000b   253   253   051    Pre-fail  Always       
-       0
  8 Seek_Time_Performance   0x0024   253   253   000    Old_age   Offline      
-       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       
-       123448
 10 Spin_Retry_Count        0x0013   253   253   049    Pre-fail  Always       
-       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       
-       101
194 Temperature_Celsius     0x0022   169   115   000    Old_age   Always       
-       23
195 Hardware_ECC_Recovered  0x000a   100   100   000    Old_age   Always       
-       11375294
196 Reallocated_Event_Count 0x0012   253   253   000    Old_age   Always       
-       0
197 Current_Pending_Sector  0x0033   253   253   010    Pre-fail  Always       
-       0
198 Offline_Uncorrectable   0x0031   253   253   010    Pre-fail  Offline      
-       0
199 UDMA_CRC_Error_Count    0x000b   100   100   051    Pre-fail  Always       
-       1
200 Multi_Zone_Error_Rate   0x000b   100   100   051    Pre-fail  Always       
-       0
201 Soft_Read_Error_Rate    0x000b   100   100   051    Pre-fail  Always       
-       0

SMART Error Log Version: 1
Warning: ATA error count 1 inconsistent with error log pointer 5

ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Timestamp = decimal seconds since the previous disk power-on.
Note: timestamp "wraps" after 2^32 msec = 49.710 days.

Error 1 occurred at disk power-on lifetime: 0 hours
  When the command that caused the error occurred, the device was active or 
idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 01 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
  -- -- -- -- -- -- -- --   ---------  --------------------
  b1 c0 00 01 00 00 a0 00 1663959.040  DEVICE CONFIGURATION RESTORE
  ec 00 03 01 00 00 a0 00 1663959.040  IDENTIFY DEVICE
  91 00 3f 01 00 00 af 00 1663959.040  INITIALIZE DEVICE PARAMETERS [OBS-6]
  10 00 00 01 00 00 a0 00 1663959.040  RECALIBRATE [OBS-4]
  ec 00 01 01 00 00 a0 00  623771.648  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged

----------------------------------------------

I think there are a few interesting things to note here:

1. The self-test execution status. It says it was interrupted by the with a 
hard or soft reset after 39 minutes, which sounds correct according to what I 
saw when it happened. So the disk acknowledges that something went wrong, the 
question is what?

2. There's a SMART attribute called "Hardware_ECC_Recovered", with the value 
11375294. I'm not sure what this means, but ECC should be some kind of error 
correction, and the value is high.

3. The "UDMA_CRC_Error_Count" is 1. Could this have happened during the failed 
self-test, or even be the cause of it? If so, what could have triggered this 
error?

4. There is one error in the log, which seems to have occured the first time 
the disk was powered up.

Apart from this, I cannot see anything that could've caused this error.

> The long self-test read scans the entire disk surface.  If the disk has an
> electronic or mechanical problem, then this extended read scan can provoke
> failure.  (This type of failure is also commonly seen when people backup
> disks.  Because the load of reading all the data from the disk is a heavy
> one, it often leads to catastrophic failure in the middle of the backup.
> This is why you should always have a PAIR of backups, an over-write the
> older of the two, but preserve the newer of the two.)
>
> Before you give up on the disk, double check the power and signal cabling
> to be sure that nothing has worked loose.  Additional comments below.

Power and and signal cabling are untouched, and the disk is working again. I 
didn't even open the machine.

> > After about an hour, this started to turn up when doing 'dmesg':
> >
> > May  2 13:23:30 rostig kernel: hdh: irq timeout: status=0xd0 { Busy }
> > May  2 13:23:31 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
> > May  2 13:23:31 rostig kernel: hdh: drive not ready for command
> > May  2 13:23:32 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
> > May  2 13:23:32 rostig kernel: hdh: drive not ready for command
> > May  2 13:23:33 rostig kernel: hdh: status timeout: status=0xd0 { Busy }
> > May  2 13:23:33 rostig kernel: hdh: drive not ready for command
>
> The drive simply stopped responding to commands.
>
> > Not good. I've also configured SMART to send me emails. I received four
> > of those, within a four-second period starting at 13:23:30.
> >
> > First:
> >
> > The following warning/error was logged by the smartd daemon:
> > Device: /dev/hdh, not capable of SMART self-check
> >
> > Second:
> >
> > The following warning/error was logged by the smartd daemon:
> > Device: /dev/hdh, failed to read SMART Attribute Data
> >
> > Third:
> >
> > The following warning/error was logged by the smartd daemon:
> > Device: /dev/hdh, Read SMART Error Log Failed
> >
> > Fourth:
> >
> > The following warning/error was logged by the smartd daemon:
> > Device: /dev/hdh, Read SMART Self Test Log Failed
>
> These four messages are because the disk wasn't reachable any more.
>
> > After that, 'smartctl -a /dev/hdh/' claimed that /dev/hdh wasn't able to
> > do SMART-communication. I then rebooted the machine. Now, the drive wont
> > even show up. 'dmesg' shows this:
> >
> > hda: Conner Peripherals 850MB - CFS850A, ATA DISK drive
> > hdc: SAMSUNG SV1204H, ATA DISK drive
> > hde: WDC WD1200AB-00CBA1, ATA DISK drive
> > hdf: WDC WD1200AB-00CBA1, ATA DISK drive
> > hdg: Maxtor 6Y120L0, ATA DISK drive
> >
> > No hdh anywhere.
>
> As I said, double check the power and signal cabling. But they are
> probably OK -- this looks like a straighfoward electronic (not
> mechanical) drive failure.

Cabling untouched, and the disk works again as it has for months. 

I'm curious; does this happen often? I mean, where the disk gets an error like 
this and then works again after 12 hours switched off?

> > Disaster. What can possibly have happened here? The HD was fairly new
> > (just a few months old) has NOT been running 24/7 or anything like
> > that although it's been running for 5-8 hours every day.
>
> Really there are just three possibilities.  (1) The additional load of a
> self-test provoked catastrophic failure (would have happened anyway, when
> the disk was under load in the future) (2) sudden electrical failure
> unrelated to self-test (eg, voltage spike killed a chip in the disk) or
> (3) cabling problems (do double check to eliminate this possiblity).

I did run selftests on three other disks simultaneously, and the finished 
fine. Cabling problem is not very probable, and voltage spikes are extremely 
rare here. (Sweden)

> > Any help or hints about this problem would be greatly appreciated,
>
> If the disk has failed (and its just a few months old) it should still be
> under warranty.  Hopefully you can re-create the data that was on it.

The disk is alive so I can take a backup now. However, won't I have a 
difficult time claiming warranty since it is fully functional now? Would you 
have tried to get a new disk if you were in my shoes?

>
> Cheers,
>     Bruce
>

Bruce, thank you very much for this very extensive reply! 

Best Regards

Fredrik Persson

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
Bruce Allen | 3 May 2004 23:19
Picon
Favicon

Re: 'smartctl -t long /dev/hdh' killed my Samsung SV1604N

Hi Fredrik,

> Hello, and thanks for your quick reply.
> 
> Short: it came back to life! How? I shut it down in the evening and started it 
> again about 12 hours later, it there the disk was, alive and kicking. So the 
> case went like this: booted the machine, ran the long self test, got the 
> errors I described below, rebooted the machine to see if that got the drive 
> working. It didn't, it got worse, the drive didn't exist at all 
> (no /dev/hdh). Turned it off, waited 12 hours, turned it on and everything 
> was back to normal.

I'd try another long self-test to see what happens.

> Before you dismiss me as a nutcase, please read the comments below. However, 
> what I'd *really* like to know is this: would '-F samsung' have made any 
> difference when I ran the long selftest?

None.  -F samsung only affects the interpretation of the results from the
error and self-test logs.  It doesn't affect how a self-test is done.

> 199 UDMA_CRC_Error_Count    0x000b   100   100   051    Pre-fail  Always       
> -       1

This is a sign of a cabling problem.  Check your cables.

> SMART Error Log Version: 1
> Warning: ATA error count 1 inconsistent with error log pointer 5

You probably need -F samsung2 (use release 5.30 of smartmontools).

> SMART Self-test log structure revision number 1
> No self-tests have been logged

You should now show a self-test logged.  If not, try -F samsung and -F
samsung2.

> 1. The self-test execution status. It says it was interrupted by the with a 
> hard or soft reset after 39 minutes, which sounds correct according to what I 
> saw when it happened. So the disk acknowledges that something went wrong, the 
> question is what?

Could be a cabling problem.

> 2. There's a SMART attribute called "Hardware_ECC_Recovered", with the value 
> 11375294. I'm not sure what this means, but ECC should be some kind of error 
> correction, and the value is high.

Ignore it.

> 3. The "UDMA_CRC_Error_Count" is 1. Could this have happened during the failed 
> self-test, or even be the cause of it? If so, what could have triggered this 
> error?

Cabling problem.

> Power and and signal cabling are untouched, and the disk is working again. I 
> didn't even open the machine.

Consistent with an intermittent cable or power connection.  Check the
cabling.

> Cabling untouched, and the disk works again as it has for months. 

I suggest you check the cabling.

> I'm curious; does this happen often? I mean, where the disk gets an error like 
> this and then works again after 12 hours switched off?

It sound like an intermittent electrical or signal connection.  Check the
power and signal cables.

> I did run selftests on three other disks simultaneously, and the finished 
> fine. Cabling problem is not very probable, and voltage spikes are extremely 
> rare here. (Sweden)

The UDMA CRC count is an indication of a cabling problem.

> The disk is alive so I can take a backup now. However, won't I have a 
> difficult time claiming warranty since it is fully functional now? Would you 
> have tried to get a new disk if you were in my shoes?

No.  I'd check the cables (unplug and replug, or change signal
cables) then run a long self-test.  It should appear in the logs with -F
samsung or -F samsung2.

Cheers,
	Bruce

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
Bruce Allen | 3 May 2004 23:24
Picon
Favicon

Re: bugs in smartd.sh script

> # In the FAQ section, there is an entry that says:
> #   What do I do if I have problems, or need support?
> 
> This appears to make the barrier for bug reports quite high IMHO:
> subscribing to a mailing list. I think bug reporting should be made as
> easy as possible. This means in prominent places and with no barriers. I
> understand however that providing an email address or a mailto: link may
> not be desirable due to spam.

It says (note final sentence):
First, search the support mailing list archives to see if your question
has been answered. Instructions are in the following paragraph. If you
don't find an answer there, then please send an email to the
smartmontools-support mailing list. This is a moderated forum: you are not
required to subscribe to the list in order to post your question.

I would like users to first search the mailing list archives before
writing.  Otherwise we simply end up answering the same questions over and
over again.

> # > - The echo -n "smartd " should have the space not appended, but prepended.
> # >   I think this is true not only for FreeBSD, but other OSs as well.
> # 
> # Could you please explain this further, and comment on this:
> # https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=107454
> 
> Well, the BSD way for startup script output is:
> 1. The framework does echo -n "Starting local daemons:"
> 2. Each local daemon is started and does echo -n " foobard"
> 3. The framework does echo "."
> (Similar for shutdown)
> 
> This leads to output along
> 
> Starting local daemons: apache privoxy smokeping smartd.
> 
> Currently it looks like this:
> Starting local daemons: apache privoxy smokepingsmartd .

OK.  I think that this needs only to be fixed for FreeBSD, not for
Solaris, Linux, NetBSD, etc.

> As for https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=107454,
> If it is intended also for FreeBSD, it would be nice to retain the
> old behavior of just echo -n " smartd" if it is called from within
> the startup framework. The interactive case can be as verbose as you
> like.
> 
> It might well be the BSD port maintainer decides to provide a complete
> implementation suited for BSD. I'm not involved in these matters; I just
> noticed the run-together smokepingsmartd and wanted to let you know.

Thank you.  It should be easy to fix.  Ed?

Cheers,
	Bruce

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
Bruce Allen | 4 May 2004 17:54
Picon
Favicon

Re: Re: [Smartmontools-database]FUJITSU MPB3043ATU

> > Unfortunately there is an obvious bug in the Fujitsu firmware on this
> > disk: the seconds counter is obviously wrapping back to zero after it gets
> > to some large value.
> 
> Are there disk drive manufacturers actually monitoring this list and/or
> the database of broken drives?

Yes, there are some vendor firmware people/test engineers subscribed to
the list.  They are usually quite responsive once they have understood and
reproduced a given problem.

Cheers,
	Bruce

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
Bruce Allen | 4 May 2004 17:58
Picon
Favicon

Re: [smartmontools-database] device model: "FUJITSU MPG3204AT E"

> Hello, Bruce!
> 
>  On Mon, May 03, 2004 at 11:11:35AM -0500, Bruce Allen wrote:
> 
> > > Please find attached the output of smartctl -a /dev/hdX.
> > 
> > Note: until it's added to the database your disk need -v 9,seconds
> 
> Thank you for the hint.  I find the resulting value strange, though.  
> Because it gives me 8 and something months of "power on" time, and it
> does not correspond to actual time this computers is on (almost all
> the time for approximately 3 years, excluding the time for h/w
> upgrades and such), nor it corresponds to uptime (39 days).  Just
> curious whether this could be some kind of "abstract units"?

Have a look at the smartmontools FAQ at
htp://smartmontools.sf.net/.  You'll see that there are a number of
firmware bugs related to Attribute 9 (though none are Fujitsu).  I suggest
that you monitor the value of Attribute 9 over a period of days or weeks
to understand it.

Then send us a note to explain!

Cheers,
	Bruce

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click

Gmane