Lars Marowsky-Bree | 2 Jun 2003 14:08
Picon

sles8 1.0.3pre RPMs build

Hi,

in anticipation of the 1.0.3 release, I've put some RPMs for _testing_
up at ftp://ftp.suse.com/pub/people/lmb/heartbeat/sles8-i386/. I'll keep
updating them there until we have reached 1.0.3 final.

(Please be aware that they are testing only and not official upgrades
etc...)

Please spare a second or two to verify whether they work for you too, and
whether your favorite bugfix is in:

* Mon Jun 02 2003  Lars Marowsky-Bree <lmb <at> suse.de> (see doc/AUTHORS file)
+ Version 1.0.3:
  + Bugfix for heartbeat uptimes of >246 days
  + Bugfix for correctly locking heartbeat into memory for soft-realtime.
  + Bugfix for hang in the heartbeat API code.
  + Bugfix to findif: IPaddr should now work correctly for /32
    addresses.
  + Bugfix to prevent the serial link become the controlling tty.
  + Update for ServeRAID resource script to work correctly with current
    ServeRAID driver + firmware.
  + Fix for shutdown ordering: Release resources first, then stop
    managed clients.
  + Several CCM fixes.
  + Documentation updates & fixes, new manpages for meatclient and 
    supervise-ldirectord
  + Updates to Debian packaging.

Those fixes were contribued by our great community, and I'd like to
(Continue reading)

Mark Watts | 2 Jun 2003 15:09

Simultanious heartbeat startup wierdness...


Not sure if this is a bug or one of those things you can't fix, but...

I have two identical directors (same hardware and os etc). If I power them on 
at the same time (eg: after a power outage) when heartbeat starts on both 
boxes, the _both_ take the VIP and both become the active node.
Restarting heartbeat on one of the boxex fixes this, but its something I have 
to do manually.

Is there a way to stagger the startup times or add a startup delay to a config 
file or something?

Cheers,

Mark.

P.S. All my previous problems are fixed and its all running smoothly :)

--

-- 
Mark Watts
Systems Engineer
QinetiQ TIM
St Andrews Road, Malvern
GPG Public Key ID: 455420ED

Dan Kendall | 2 Jun 2003 15:53

RE: Simultanious heartbeat startup wierdness...

I think the initdead config option is your man here.  IIRC if node A is
config'd to start the VIP (or any resource) then node B should wait initdead
time before it checks to see if the VIP (resource) is acquired on node A...
I think

Dan

> -----Original Message-----
> From: Mark Watts [mailto:m.watts <at> eris.qinetiq.com]
> Sent: 02 June 2003 14:09
> To: linux-ha <at> muc.de
> Subject: Simultanious heartbeat startup wierdness...
> 
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> Not sure if this is a bug or one of those things you can't fix, but...
> 
> I have two identical directors (same hardware and os etc). If 
> I power them on 
> at the same time (eg: after a power outage) when heartbeat 
> starts on both 
> boxes, the _both_ take the VIP and both become the active node.
> Restarting heartbeat on one of the boxex fixes this, but its 
> something I have 
> to do manually.
> 
> Is there a way to stagger the startup times or add a startup 
(Continue reading)

Lars Marowsky-Bree | 2 Jun 2003 15:51
Picon

Re: Problems with STONITH via apcsmart

On 2003-05-11T14:24:18,
   ha-users <at> yieldtech.cz said:

BTW, it turns out that the

> May  9 19:57:58 cl2 heartbeat[8716]: ERROR: STONITH of  V^H^H?ß^RB failed.  Retrying...
> May  9 19:57:58 cl2 heartbeat[9528]: info: Resetting node cl1.YieldTech.cz with [APCSmart-Stonith]

message, which suggested some memory corruption or similar, turns out to
have been a corrupted log message due to a mistake in the error path;
the error is fixed in CVS, and will be fixed in 1.0.3, but until then,
just ignore that one and pretend that instead of the binary garbage, the
real hostname gets printed ;-) It doesn't affect operation at all.

Sincerely,
    Lars Marowsky-Brée <lmb <at> suse.de>

--

-- 
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur

Alan Robertson | 2 Jun 2003 16:00
Gravatar

Re: Simultanious heartbeat startup wierdness...

Mark Watts wrote:
> 
> Not sure if this is a bug or one of those things you can't fix, but...
> 
> I have two identical directors (same hardware and os etc). If I power them on 
> at the same time (eg: after a power outage) when heartbeat starts on both 
> boxes, the _both_ take the VIP and both become the active node.
> Restarting heartbeat on one of the boxex fixes this, but its something I have 
> to do manually.
> 
> Is there a way to stagger the startup times or add a startup delay to a config 
> file or something?

This is not a known bug in recent releases.  What version are you running? 
Are you running nice_failback on or off?

I test specifically for this - about 1000 times before each stable new release.

--

-- 
     Alan Robertson <alanr <at> unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim
from you at all times your undisguised opinions." - William Wilberforce

Mark Watts | 2 Jun 2003 16:01

Re: Simultanious heartbeat startup wierdness...


> Mark Watts wrote:
> > Not sure if this is a bug or one of those things you can't fix, but...
> >
> > I have two identical directors (same hardware and os etc). If I power
> > them on at the same time (eg: after a power outage) when heartbeat starts
> > on both boxes, the _both_ take the VIP and both become the active node.
> > Restarting heartbeat on one of the boxex fixes this, but its something I
> > have to do manually.
> >
> > Is there a way to stagger the startup times or add a startup delay to a
> > config file or something?
>
> This is not a known bug in recent releases.  What version are you running?
> Are you running nice_failback on or off?
>
> I test specifically for this - about 1000 times before each stable new
> release.

This is a mandrake packaged 1.0.2.

nice_failback is ON.

Mark.

--

-- 
Mark Watts
Systems Engineer
QinetiQ TIM
St Andrews Road, Malvern
(Continue reading)

Alan Robertson | 2 Jun 2003 16:14
Gravatar

Re: Simultanious heartbeat startup wierdness...

Mark Watts wrote:
> 
> 
>>Mark Watts wrote:
>>
>>>Not sure if this is a bug or one of those things you can't fix, but...
>>>
>>>I have two identical directors (same hardware and os etc). If I power
>>>them on at the same time (eg: after a power outage) when heartbeat starts
>>>on both boxes, the _both_ take the VIP and both become the active node.
>>>Restarting heartbeat on one of the boxex fixes this, but its something I
>>>have to do manually.
>>>
>>>Is there a way to stagger the startup times or add a startup delay to a
>>>config file or something?
>>
>>This is not a known bug in recent releases.  What version are you running?
>>Are you running nice_failback on or off?
>>
>>I test specifically for this - about 1000 times before each stable new
>>release.
> 
> 
> This is a mandrake packaged 1.0.2.
> 
> nice_failback is ON.

OK.  This is *very* odd.  To my knowledge, nice_failback has never exhibited 
this problem in any release. It is quite paranoid about this kind of thing.

(Continue reading)

Horms | 2 Jun 2003 16:19
Picon
Gravatar

Re: Simultanious heartbeat startup wierdness...

On Mon, Jun 02, 2003 at 08:00:30AM -0600, Alan Robertson wrote:
> Mark Watts wrote:
> >
> >Not sure if this is a bug or one of those things you can't fix, but...
> >
> >I have two identical directors (same hardware and os etc). If I power them 
> >on at the same time (eg: after a power outage) when heartbeat starts on 
> >both boxes, the _both_ take the VIP and both become the active node.
> >Restarting heartbeat on one of the boxex fixes this, but its something I 
> >have to do manually.
> >
> >Is there a way to stagger the startup times or add a startup delay to a 
> >config file or something?
> 
> 
> This is not a known bug in recent releases.  What version are you running? 
> Are you running nice_failback on or off?
> 
> I test specifically for this - about 1000 times before each stable new 
> release.

I am wonderinging if the problem relates to the deadtime
being to low. It would be interesting to see
if increasing that value aleviates the problem.

--

-- 
Horms

Alan Robertson | 2 Jun 2003 16:37
Gravatar

Re: Simultanious heartbeat startup wierdness...

Horms wrote:
> On Mon, Jun 02, 2003 at 08:00:30AM -0600, Alan Robertson wrote:
> 
>>Mark Watts wrote:
>>
>>>Not sure if this is a bug or one of those things you can't fix, but...
>>>
>>>I have two identical directors (same hardware and os etc). If I power them 
>>>on at the same time (eg: after a power outage) when heartbeat starts on 
>>>both boxes, the _both_ take the VIP and both become the active node.
>>>Restarting heartbeat on one of the boxex fixes this, but its something I 
>>>have to do manually.
>>>
>>>Is there a way to stagger the startup times or add a startup delay to a 
>>>config file or something?
>>
>>
>>This is not a known bug in recent releases.  What version are you running? 
>>Are you running nice_failback on or off?
>>
>>I test specifically for this - about 1000 times before each stable new 
>>release.
> 
> 
> I am wonderinging if the problem relates to the deadtime
> being to low. It would be interesting to see
> if increasing that value aleviates the problem.

If deadtime were too low, I believe he would be having other complaints - 
about split brain/partitioned cluster causing a restart.
(Continue reading)

Mark Watts | 2 Jun 2003 16:53

Re: Simultanious heartbeat startup wierdness...


> OK.  This is *very* odd.  To my knowledge, nice_failback has never
> exhibited this problem in any release. It is quite paranoid about this kind
> of thing.
>
> Is there a possibility that you enabled one or both of the two machines to
> start the virtual IP address in the Mandrake system startup configuration?
> If you do, then heartbeat will put messages in the logs complaining about
> it.
>
> Send me the logs.  Better yet, reproduce the problem with debug on, and
> send me *those* logs.
>
> If you start it up on both machines at about the same time with
> 	/usr/lib/heartbeat/heartbeat -d
> that should do it.

Mandrake's config utilities had nothign to do with these boxes, and the VIP is 
only mentioned in the heartbeat configs.

I'll try the debug stuff and get back to you.

--

-- 
Mark Watts
Systems Engineer
QinetiQ TIM
St Andrews Road, Malvern
GPG Public Key ID: 455420ED

(Continue reading)


Gmane