Paul Jakma | 1 Jun 2005 07:54
Picon

[quagga-dev 3475] Re: [PATCH] non-blocking I/O from client daemons tozebra

On Tue, 31 May 2005, John Payne wrote:

> Will this make it into 0.98.4 ?

As Hasso indicated - unlikely.

It's a fairly deep change. Simple changes in some ways, but it does 
significantly change how bgpd will process routes, which could have 
side-effects. So it needs shake-down time in 0.99.

Also, the zebra rib_process work-queue seems to have uncovered 
problems (eg malformed IPv6 ROUTE_ADD messages which caused clients 
to assert). And zebra needs a few other things converted to queues 
possibly (eg kernel_read() probably).

Finally, converting to queueing is simply a fundamental change which 
will require a period of observation. One potential pitfall is 
negative feedback in terms of system load, eg due to RAM pressure, 
causing a queue to fill up faster than it is processed, with the 
additional RAM pressure hence adding to system load and causing even 
further backlog of the queueue, more RAM used, even slower processing 
of the queue, etc.

There's a whole sub-branch of CS dedicated to queueing[1], so it's 
probably wise to let these changes 'soak' for a while and see whether 
we need hueristics to try catch situations where a queue is congested 
and needs extra time spent on clearing it - I'm hopeful we won't.

1. A complicated field, which in practice is ofen simply avoided 
altogether by over-providing on capacity/resources :)
(Continue reading)

Paul Jakma | 1 Jun 2005 13:27
Picon

[quagga-dev 3476] ripd authentication cant be unset

Hi,

It's impossible to not have ripd do authentication presently. The 
default is simple authentication (even if no key is set). See:

 	http://hibernia.jakma.org/~paul/patches/quagga-ripd-noauth.diff

which makes noauth the default unless authentication is actually 
configured.

It changes behaviour slightly though. In order to be perfectly 
backwards compatible we would have to extend ip rip auth, eg:

   ip rip authentication noauth

but that seems quite silly. Also, anyone who relied on simple being 
the default should just be able to 'write file' and have a 
forward-compatible configuration file written out, /before/ 
upgrading.

What say?

regards,
--

-- 
Paul Jakma	paul <at> clubi.ie	paul <at> jakma.org	Key ID: 64A2FF6A
Fortune:
If at first you don't succeed, redefine success.
Paul Jakma | 1 Jun 2005 13:35
Picon

[quagga-dev 3477] Re: [PATCH] non-blocking I/O from client daemons tozebra

On Mon, 30 May 2005, Simon Talbot wrote:

> It could also be happenstance, but it seems like some sessions are 
> re-setting more often than they used to -- it could be that more 
> engineering work is going on than normal with out peers, so I need 
> to watch this for considerably longer to be sure.

Any further information on this? Was it happenstance, or is there a 
problem?

regards,
--

-- 
Paul Jakma	paul <at> clubi.ie	paul <at> jakma.org	Key ID: 64A2FF6A
Fortune:
The whole world is a tuxedo and you are a pair of brown shoes.
 		-- George Gobel
se.ofbiz | 1 Jun 2005 17:54
Favicon

[quagga-dev 3478] It must be a bug of OSPF apiserver!

when I use ospfclient to dump ism_change, but I have't get any proper msg.
I had made several tests to find the change of interface which ospfclient can be recieved, but all the msg I
recieved is "state:1 [down]", although the msg in the ospfd which turn debug on reported inteface status
from 1 to 3, and from 3 to 6, from 6 to 7. 
I studied the source, and found the 
ospf_apiserver.c

ospf_apiserver_clients_notify_ism_change (struct ospf_interface *oi)
{
struct msg *msg;
struct in_addr ifaddr = { 0L };
struct in_addr area_id = { 0L };

assert (oi);
assert (oi->ifp);

if (oi->address)
{
ifaddr = oi->address->u.prefix4;
}
if (oi->area)
{
area_id = oi->area->area_id;
}

msg = new_msg_ism_change (0, ifaddr, area_id, oi->ifp->status);
if (!msg)
{
zlog_warn ("apiserver_clients_notify_ism_change: msg_new failed");
return;
(Continue reading)

Paul Jakma | 1 Jun 2005 18:48
Picon

[quagga-dev 3479] Re: [PATCH] non-blocking I/O from client daemons tozebra

NB:

The workqueue bits are now in CVS. So CVS snapshots as of tomorrow 
morning 0700 (GMT) should have the bgpd work-queue stuff.

--paulj
Simon Talbot | 3 Jun 2005 02:23
Picon

[quagga-dev 3480] Re: [PATCH] non-blocking I/O from client daemonstozebra

Bad news Paul -- During our maintenance window this evening, we had to
do a cold re-start of the routers upon which I am running the work-queue
version patch etc. During start-up (remember two transit router + 81
peers to come up) I ended up with a Kernel Pannic (IRQ Not synching
etc.). I only had a very limited amount of time to investigate this, but
did re-produce reliably by cold starting quagga, not every time, but
about 70% of the time. I also recreated it on two routers. Both routers
have now been rolled back to 0.99.0 and are stable through re-boots etc.

Also noticed that with the work queue patch, the routers were
considerably slower at bringing all sessions up (when they did not
kernel panic)

I am sorry, in diagnostic terms I have very little for you as I could
not get the details of the kernel panic (I was working through a 2 line
by 16 character LCD -- No laptop with me, was meant to be an easy
changeover of a couple of cards !)

It is going to be very hard for me to do further testing on this, as it
causes route flap with our peers each time I carry out a test, and they
start to get a little prickly !

I am guessing, but probably under extremely heavy load bgpd is probably
running the kernel out of buffers/memory and hence the panic, or the
queues are going haywire.

Sorry I can't be too much more help -- was a bad night !

Simon

(Continue reading)

Paul Jakma | 3 Jun 2005 09:12
Picon

[quagga-dev 3481] Re: [PATCH] non-blocking I/O from client daemonstozebra

Hi Simon,

On Fri, 3 Jun 2005, Simon Talbot wrote:

> Bad news Paul -- During our maintenance window this evening, we had 
> to do a cold re-start of the routers upon which I am running the 
> work-queue version patch etc. During start-up (remember two transit 
> router + 81 peers to come up) I ended up with a Kernel Pannic (IRQ 
> Not synching etc.). I only had a very limited amount of time to 
> investigate this, but did re-produce reliably by cold starting 
> quagga, not every time, but about 70% of the time. I also recreated 
> it on two routers. Both routers have now been rolled back to 0.99.0 
> and are stable through re-boots etc.

Hmm, that is bad news.

However, that sounds like a kernel or hardware problem.

> Also noticed that with the work queue patch, the routers were 
> considerably slower at bringing all sessions up (when they did not 
> kernel panic)

Odd.

> I am sorry, in diagnostic terms I have very little for you as I 
> could not get the details of the kernel panic (I was working 
> through a 2 line by 16 character LCD -- No laptop with me, was 
> meant to be an easy changeover of a couple of cards !)

Ok.
(Continue reading)

Simon Talbot | 3 Jun 2005 10:24
Picon

[quagga-dev 3482] Re: [PATCH] non-blocking I/O from client daemonstozebra

>Having additional test boxes really helps. Also, one of the advantages
of Quagga is that you should have saved enough money >to be able to
install a second router. With the trend towards ethernet interconnects
that can be more easily shared without >fancy hardware (compared to
STM/Ex)..

The routers in question are already a cluster, two identical hardware
units using Linux-HA techniques for failover. With the work queues
installed, both behaved in exactly the same way and kernel panicked,
once rolled back to a know good version they were fine. They are routers
of which we have about 20 throughout the business, all running the exact
same kernel build (our own custom wrapping of 2.4.22) and are (touch
wood) extremely stable -- I cannot remember the last time any of them
ever re-booted or kernel panicked. To make two of them, both behave in
the same way (kernel panic) with a software version change and then
behave properly again when the version id rolled back would definitely
lead me to the software as the root cause -- even if it is actually
memory starvation which is causing the panic.

The routers are both 512MB Ram with no swap disk -- well they have no
disk at all -- so when they run out of RAM, they really run out !

The following is the memory stats of one of the routers, when stable
running 0.99.0

             total       used       free     shared    buffers
cached
Mem:        506296     255988     250308          0        336
84372
-/+ buffers/cache:     171280     335016
(Continue reading)

Paul Jakma | 3 Jun 2005 11:01
Picon

[quagga-dev 3483] Re: [PATCH] non-blocking I/O from client daemonstozebra

On Fri, 3 Jun 2005, Simon Talbot wrote:

> The routers in question are already a cluster, two identical 
> hardware units using Linux-HA techniques for failover.

Phew :)

> With the work queues installed, both behaved in exactly the same 
> way and kernel panicked, once rolled back to a know good version 
> they were fine. They are routers of which we have about 20 
> throughout the business, all running the exact same kernel build 
> (our own custom wrapping of 2.4.22) and are (touch wood) extremely 
> stable -- I cannot remember the last time any of them ever 
> re-booted or kernel panicked. To make two of them, both behave in 
> the same way (kernel panic) with a software version change and then 
> behave properly again when the version id rolled back would 
> definitely lead me to the software as the root cause -- even if it 
> is actually memory starvation which is causing the panic.

Highly odd. Even excessive memory usage by an application still 
shouldn't cause a panic.

> The routers are both 512MB Ram with no swap disk -- well they have no
> disk at all -- so when they run out of RAM, they really run out !

Hehe. Still shouldn't cause a panic though.

> The following is the memory stats of one of the routers, when stable
> running 0.99.0
>
(Continue reading)

Picon
Favicon

[quagga-dev 3484] Re: rfc2385 problem

Rick Payne wrote:
>> Also note that there are reports that MD5 patch crashes kernel. 
>> Although I
>> can reproduce it randomly, I have no resources and knowledge to dig into
>> it. Someone with more knowledge about kernel internals might have idea:
> 
> 
> I've a few ideas, but not had the time or inclination to look at that 
> code for ages.
> 
> Rick

hi,
we have the same strange kernel panic as mentioned before, with quagga:
.
kernel BUG at slab.c:1130!
,

- this problem comes randomly after quagga start, sometimes is router 
alive 1 minute, sometimes 2-3 days
- system is Debian Woody, quagga_0.98.3-0.backports
- vanilla kernel 2.4.28+rfc2385-2.4.28 patch, also tested vanilla kernel 
2.4.29+rfc2385-2.4.28.patch, kernel-2.4.30+rfc2385-2.4.30.patch

- 1st idea was, that it is hardware problem, so we copied system to new 
hardware, system panic again :(

 >> Also note that there are reports that MD5 patch crashes kernel.

do you think, that our problem is as same as you wrote ?
(Continue reading)


Gmane