We are having a little trouble with OSPF and ip_conntrack
tables becoming full. Basically a router becomes offline when it’s
conntrack table becomes full, but worse still other downstream routers do not
reroute through alternate redundant paths when this happens. Whole sections of
our network disappear.
This is causing instability in a large network but I can
distil the situation down to a small example as follows:
Take four Linux routers connected
in a ring. Call them A, B, C and D. All the links are DSL at 2Mbs.
Sending a ping from A to C
we find that normally it goes out through B and the replies comes back through
D. This is easily seen by the DSL link lights flashing all around the ring in
my lab set up.
Then I load the ip_conntrack
kernel module into unit B and set the connection table maximum size to only 2
thus guaranteeing that it will be full.
Result is that the ping no
longer makes it through unit B, the ping fails.
Basically we have lost
communication with unit C (and B of course)
Increasing the size of the
table back to 2048 allows the ping through B again and all is well.
This shows that a full
conntrack table will prevent packet forwarding.
Now I set unit B's conntrack
table back to 2, hence full, and wait a number of hours.
Result is that no routes
change in any unit and the ping never works.
This shows that OSPF never
realizes that unit B is no longer routing traffic. After all, all the links are
up and running. Unit B and C drop off the net forever (Or at least for the
three or four hours I have waited so far).
Hope I have explained this
clearly. Is this expected OSPF behavior? Are there some ip_tables rules I could
use to prevent this? Or do I have to rely on huge conntrack tables that won’t