Christian Franke | 1 Nov 2007 22:08
Picon
Favicon

Re: --service switch for any/all platforms

Sergey Svishchev wrote:
> On Tue, Jun 05, 2007 at 12:22:11PM -0700, Jesse Peterson wrote:
>> Hello,
>>
>> I'd like to suggest that a switch to prevent smartd from forking is  
>> useful across all platforms. My rationale for this is that I have  
>> daemon monitoring tools on different platforms (notably launchd[1] 
>> on  Darwin/Mac OS X) that depend on non-forking daemons.
>
> Back in 2005, Enrico Scholz posted a patch that did just that:
>
> http://osdir.com/ml/linux.utilities.smartmontools/2005-12/msg00049.html
>

Just for Info: Patch checked in. Option '-n, --no-fork' now available in 
CVS.

Christian

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
Bruce Allen | 2 Nov 2007 05:23
Picon
Favicon

Re: Unusually high Load_Cycle_Counts

Hi Rui,

On Tue, 30 Oct 2007, Rui Tiago Cação Matos wrote:

> [ This mail won't be properly threaded since I wasn't subscribed at the
> time Theresa posted. ]
>
> Hi, my concerns are mostly the same as Theresa's. I have a Dell Latitude
> D630. smartctl reports that my drive isn't in the database, maybe you
> can add it now?

Please see the smartmontools FAQ about how to have your drive added.

> Attached is my smartctl output with around 24000 load
> cycles on around 170 hours of usage.
>
> Are those values correct? If I disable this spin up/down cycles my 
> disk's temperature increases from around 38 degrees to a mean of 45, can 
> this shorted its lifespan?

In general, yes, raising drive temperature does lower lifetime.  But do do 
large numbers of lead/unload cycles!

I have read that in general, modern drives that use fluid bearings seem to 
be less sensitive to failure from elevated temperatures than older drives 
that used mechanical (ball) bearings.

For what it's worth, on my laptop I show 751000 cycles in 3000 operating 
hours.  Currently temperature 32 Celsius (drive spinning!).

(Continue reading)

Bruce Allen | 2 Nov 2007 06:02
Picon
Favicon

Re: Unusually high Load_Cycle_Counts

Hi Theresa,

I just learned about this buzz from a colleague yesterday.

I don't have any experience with your Samsung drive.  I suggest that you 
run a sort self-test '-t short' and wait until it completes.  The drive 
age should then be shown in the self test log.  Then experiment with the 
different -v and -F options to see how the drive is storing its lifetime.

IMPORTANT INFORMATION BELOW: PLEASE PASS BACK TO THE UBUNTU COMMUNITY

I think that the -B value of 255 is incorrect.  You should use 254 for 
maximum performance.  255 IS DOCUMENTED AS 'RESERVED' IN THE ATA/SATA 
SPECS.  THE BEHAVIOR OF -B 255 THUS IS NOT PREDICTABLE AND IT MAY HAVE NO 
EFFECT.  Also according to the ATA/SATA specs any value greater than or 
equal to 128 will 'not permit the device to spin down to save power'. So 
128 will reduce power use as much as possible but not permit spin-down.

References:
http://www.t13.org/Documents/UploadedDocuments/docs2007/D1532v1r4b-AT_Attachment_with_Packet_Interface_-_7_Volume_1.pdf
PDF page 273 Document page 253
Table 43 (and the paragraph immediately following it).

So I suggest you try some different -B values such as -B 254 or -B 128.

The hdparm man page says 'values of 255 will disable Advanced Power 
Management'.  I think this is a mistake in the man page.  According to the 
ATA/SATA specs referenced above, the value 255 is reserved and has vendor 
dependent meaning (or has no effect).

(Continue reading)

Jon Hardcastle | 2 Nov 2007 09:56
Picon
Favicon

Bug/Issue

I have discovered what i think is an issue with the
delay smartd allows before running a scheduled check
on a SPUN DOWN SATA DRIVE.

I get a message telling me that the drive isn't
capable in the log barely 5 secs after it has tried to
do the test - not enough time for the drive to spin
up! I have confirmed this by tricking smartd and
making sure the drive is up. Works fine.

Also I had previously seen problems with the 30min
checks where by my log is filled with 'ata soft reset
errors' when it tries to 'check' a spun down disk.
Stopping it checking if the drive is sleeping 'fixed'
this.

Any comments? 

I have posted on Gentoo about this also...

http://forums.gentoo.org/viewtopic-p-4437926.html?sid=e8a8fdb5f0d9c8c847407559e9716713

-----------------------
N: Jon Hardcastle
E: Jon <at> eHardcastle.com
'The writing is on the wall...'
-----------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
(Continue reading)

André Paulsberg | 2 Nov 2007 01:27
Picon

Re: smartd limit fix for new 3ware controllers

>> if I uncomment any of the 8 last drives I get this error message:
>> -----
>> smartd version 5.37 [x86_64-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>> Opened configuration file /etc/smartd.conf
>> File /etc/smartd.conf line 102 (drive /dev/twa0): Directive -d 3ware,N (N=16) must have 0 <= N <= 15
>> File /etc/smartd.conf line 102 (drive /dev/twa0): Invalid argument to -d Directive: 3ware,16
>> Valid arguments to -d Directive are:
>> ata, scsi, marvell, removable, sat, 3ware,N, hpt,L/M/N
>>
>> Configuration file /etc/smartd.conf has fatal syntax errors.
>> -----
>>
>> It seems 3ware controllers is supported , but only up to the versions with 16 drives .
>> making smartd only monitoring 66% of my drives .
>> Is there a possibility to have a fix for this problem to allow
>> for 3ware controllers with more than 16 disks ?
>
>
> Please build the current version of the code from CVS.
> That should support all of your drives.
>
> This is new code so please report back if it works (or does not work).

Your new V5.38 code fixed the 16 drive limit , and for now looks to work perfectly .
Hopefully my drives will work perfectly , to minimize the need for smartd :)

Thanks for your help , André

-------------------------------------------------------------------------
(Continue reading)

Bruce Allen | 2 Nov 2007 11:32
Picon
Favicon

Re: Bug/Issue

Jon: thanks for the report.

Tejun: what are your thoughts about this?  Should be fix be at the kernel 
level or at the application level?  Did you recent libata changes already 
address this?

Cheers,
      Bruce

On Fri, 2 Nov 2007, Jon Hardcastle wrote:

> I have discovered what i think is an issue with the
> delay smartd allows before running a scheduled check
> on a SPUN DOWN SATA DRIVE.
>
> I get a message telling me that the drive isn't
> capable in the log barely 5 secs after it has tried to
> do the test - not enough time for the drive to spin
> up! I have confirmed this by tricking smartd and
> making sure the drive is up. Works fine.
>
> Also I had previously seen problems with the 30min
> checks where by my log is filled with 'ata soft reset
> errors' when it tries to 'check' a spun down disk.
> Stopping it checking if the drive is sleeping 'fixed'
> this.
>
> Any comments?
>
> I have posted on Gentoo about this also...
(Continue reading)

Jon Hardcastle | 2 Nov 2007 11:43
Picon
Favicon

Re: Bug/Issue

Hey, no probs.

I was worried smartmon might have been left to drift
and wasn't maintained.

Would've been a tragedy for such a fantastic suit!

Also note I found another bug thing... I mentioned on
my post on Gentoo.

My IDE drives I can see the prgress of a test i kick
off.. I cant with SATA ones.. but i think it updates
the list on completion.

today at 3pm my drives are scheduled to check again.
No tricks this time.. I have just not configured them
to spin down. I have a feeling it will work.

Cheers. 

--- Bruce Allen <ballen <at> gravity.phys.uwm.edu> wrote:

> Jon: thanks for the report.
> 
> Tejun: what are your thoughts about this?  Should be
> fix be at the kernel 
> level or at the application level?  Did you recent
> libata changes already 
> address this?
> 
(Continue reading)

Jon Hardcastle | 2 Nov 2007 12:44
Picon
Favicon

Re: Bug/Issue

I am at work atm so I can't provide precise .conf
setting and log errors BUt the jist of it is as
follows.

I have my 5 drives (2 IDE 3 SATA) configured to run a
short test tues/weds/thurs/sat/sun and a long test
fri/mon at 3pm

when it tries to do this the 3 sata drives fail to do
so and I am notified as such via email. Please note at
this stage the 3 sata are raided and used for data the
2 ide are also raided but are root. Come 3pm the data
drives will have almost certainly spun down.

Looking at the logs there is seconds(as in 2~5)
literally between when the smartd kicks in on the sata
and when it decides the drive isn't capable. I can
tell you now it takes at least 10~15 seconds for the
drive to spin up.. probably longer. I also know that
when i run the test manually using smartctl it seems
to work, and that when the scheduled test is run when
the drives HAVENT spun down it also works.

I get the SATA soft reset error I have seen on these
forums and on the internet and it seems to be caused
by the spinning up of the drives and it not responding
quick enough (or something).. I will know for sure at
3pm BST (currently 11:40) as the server is configured
to run the tests as per usual with me doing nothing to
'engineer' it. All I have done is stopped hdparm from
(Continue reading)

Tejun Heo | 2 Nov 2007 13:11
Picon

Re: [smartmontools-support] Bug/Issue

Hello, Jon.

Please don't top-post.

Jon Hardcastle wrote:
> Looking at the logs there is seconds(as in 2~5)
> literally between when the smartd kicks in on the sata
> and when it decides the drive isn't capable. I can
> tell you now it takes at least 10~15 seconds for the
> drive to spin up.. probably longer. I also know that
> when i run the test manually using smartctl it seems
> to work, and that when the scheduled test is run when
> the drives HAVENT spun down it also works.

Okay, please do the following.

1. Post /var/log/boot.msg or dmesg result right after boot.
2. Post the result of "lspci -nnv"
3. Post the results of "hdparm -I /dev/sdX" where sdX is one of th
problematic drives.
4. Run "hdparm -y /dev/sdX" to spin it down then issue SMART short test
manually.  If it fails the same way, post what smartd/smartctl says and
the result of dmesg after the failure.

Thanks.

--

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
(Continue reading)

Tejun Heo | 2 Nov 2007 13:09
Picon

Re: [smartmontools-support] Bug/Issue

Hello, Jon.

Please don't top-post.

Jon Hardcastle wrote:
> Looking at the logs there is seconds(as in 2~5)
> literally between when the smartd kicks in on the sata
> and when it decides the drive isn't capable. I can
> tell you now it takes at least 10~15 seconds for the
> drive to spin up.. probably longer. I also know that
> when i run the test manually using smartctl it seems
> to work, and that when the scheduled test is run when
> the drives HAVENT spun down it also works.

Okay, please do the following.

1. Post /var/log/boot.msg or dmesg result right after boot.
2. Post the result of "lspci -nnv"
3. Post the results of "hdparm -I /dev/sdX" where sdX is one of th
problematic drives.
4. Run "hdparm -y /dev/sdX" to spin it down then issue SMART short test
manually.  If it fails the same way, post what smartd/smartctl says and
the result of dmesg after the failure.

Thanks.

--

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
(Continue reading)


Gmane