Gilles Buisson | 9 Sep 2010 10:25
Favicon

no more ODBC in DLZ ?

Hello all,

We use several Bind DNS servers 9.5.2 with DLZ connected to MSSQL database
server by ODBC (unixODBC and FreeTDS), servers are Fedora
Unfortunately since the 9.6.0b1 release, in the bind-spec source and RPM
changelog we can read : 
  - don't build ODBC and Berkeley DB DLZ drivers

I've manage to build BIND with the --with-dlz-odbc option, the build's good
but ODBC doesn't work.

Is it normal? there's no more way to have an ODBC backend for Bind?

Regards

Gilles Buisson

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
Graeme Fowler | 9 Sep 2010 11:10

Re: no more ODBC in DLZ ?

On Thu, 2010-09-09 at 10:25 +0200, Gilles Buisson wrote:
> We use several Bind DNS servers 9.5.2 with DLZ connected to MSSQL database
> server by ODBC (unixODBC and FreeTDS), servers are Fedora
> Unfortunately since the 9.6.0b1 release, in the bind-spec source and RPM
> changelog we can read : 
>   - don't build ODBC and Berkeley DB DLZ drivers

That's a distribution packaging change, which you chould be asking of
the Fedora package maintainers. The date is pretty significant:

* Mon Nov 10 2008 Adam Tkac <atkac redhat com> 32:9.6.0-0.1.b1
- 9.6.0b1 release
- don't build ODBC and Berkeley DB DLZ drivers

...because it's almost two years ago! This isn't exactly a recent
change.

Graeme

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
Michael J. Ayers | 29 Sep 2010 23:18
Picon

BIND-DLZ cache corruption issue.

Hey all,


We recently ran into an issue where BIND-DLZ incurred complete cache corruption due to a single bad entry in our zone.  We had an TXT/SPF record in our zone that was accidentally set to over 255 characters in length.  When the record was looked up and cached, BIND apparently read the first 255 bytes of the record and then truncated the rest (including the closing quote) off.  This had the affect of making the entire cache unusable.  Symptoms were initial lookups would work with a small delay in the lookup while cache was checked, however all subsequent requests for cached entries would not return a result.  This made our DNS completely unusable until the issue was tracked down.

I know that there should be some sort of check in the data entry/management application for the zone to validate that records entered in are correct, however does it not make sense to do some sort of per entry check to make sure a record is valid prior to accepting and returning a response to the request.  BIND traditionally checks for these type of issues at startup, however when utilizing DLZ this startup check is bypassed.  I guess I am requesting that some form of record/entry checking be added to the code to prevent issues which might make DNS completely unavailable.

Thoughts?

--M

--
Michael J. Ayers
Senior Systems Engineer
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers
Rob Butler | 30 Sep 2010 16:41
Picon
Favicon

Re: BIND-DLZ cache corruption issue.

DLZ uses the same functions BIND does for 'loading' zone data.  The only difference is this 'loading' is done on demand instead of on startup.

It shouldn't have affected your cache, as records from DLZ shouldn't be in the BIND cache.  Are you sure the truncated record was the root cause?

Rob

From: Michael J. Ayers <ayerslists <at> gmail.com>
To: bind-dlz-testers <bind-dlz-testers <at> lists.sourceforge.net>
Sent: Wed, September 29, 2010 5:18:21 PM
Subject: [Bind-dlz-testers] BIND-DLZ cache corruption issue.

Hey all,

We recently ran into an issue where BIND-DLZ incurred complete cache corruption due to a single bad entry in our zone.  We had an TXT/SPF record in our zone that was accidentally set to over 255 characters in length.  When the record was looked up and cached, BIND apparently read the first 255 bytes of the record and then truncated the rest (including the closing quote) off.  This had the affect of making the entire cache unusable.  Symptoms were initial lookups would work with a small delay in the lookup while cache was checked, however all subsequent requests for cached entries would not return a result.  This made our DNS completely unusable until the issue was tracked down.

I know that there should be some sort of check in the data entry/management application for the zone to validate that records entered in are correct, however does it not make sense to do some sort of per entry check to make sure a record is valid prior to accepting and returning a response to the request.  BIND traditionally checks for these type of issues at startup, however when utilizing DLZ this startup check is bypassed.  I guess I am requesting that some form of record/entry checking be added to the code to prevent issues which might make DNS completely unavailable.

Thoughts?

--M

--
Michael J. Ayers
Senior Systems Engineer

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers
Michael J. Ayers | 30 Sep 2010 18:43
Picon

Re: BIND-DLZ cache corruption issue.

Positive.  You can probably replicate this yourself.  We are using BIND 6.5.1-3.P3 and 9.6.1-16-P3 here.  Add a zone for example.com (or other) and add an MX host to it then add a TXT record for SPF similar to the following:



We added a single additional MX host to the preceding record and the problem occured.  If you set the logging to a debug level you can see the request come in from any MTA that checks and honors SPF.  When the request comes in all initial requests that are not in cache work with a small delay (which if I am not mistaken is due to the cache scan taking longer) and lookups that should be in cache, cannot be retrieved and under an strace you can watch the daemon just spin through the cache and never find a result.

It might be important to point out that there was a problem with TXT records a few years ago regarding improper formatting and/or over length entries where BIND would fail to load the zone and would not output an error.  With extremely large zones this causes some debugging issues since BIND might take 5-10 minutes to start and you only find out about the missing zone when no data gets returned for it upon a lookup.  This debugging issue coupled with the startup time is the exact reason we switched to DLZ.

I'd be willing to do some additional testing with this to identify where and how the issue exactly occurs but it took our service offline for nearly 3 hours last week as the systems engineers on hand didn't have the experience with BIND I do and didn't know where to look for the issue.  Simply removing the additional entry and reloading fixes the problem.

Thanks,

Mike

2010/9/30 Rob Butler <crodster2k <at> yahoo.com>
DLZ uses the same functions BIND does for 'loading' zone data.  The only difference is this 'loading' is done on demand instead of on startup.

It shouldn't have affected your cache, as records from DLZ shouldn't be in the BIND cache.  Are you sure the truncated record was the root cause?

Rob

From: Michael J. Ayers <ayerslists <at> gmail.com>
To: bind-dlz-testers <bind-dlz-testers <at> lists.sourceforge.net>
Sent: Wed, September 29, 2010 5:18:21 PM
Subject: [Bind-dlz-testers] BIND-DLZ cache corruption issue.

Hey all,

We recently ran into an issue where BIND-DLZ incurred complete cache corruption due to a single bad entry in our zone.  We had an TXT/SPF record in our zone that was accidentally set to over 255 characters in length.  When the record was looked up and cached, BIND apparently read the first 255 bytes of the record and then truncated the rest (including the closing quote) off.  This had the affect of making the entire cache unusable.  Symptoms were initial lookups would work with a small delay in the lookup while cache was checked, however all subsequent requests for cached entries would not return a result.  This made our DNS completely unusable until the issue was tracked down.

I know that there should be some sort of check in the data entry/management application for the zone to validate that records entered in are correct, however does it not make sense to do some sort of per entry check to make sure a record is valid prior to accepting and returning a response to the request.  BIND traditionally checks for these type of issues at startup, however when utilizing DLZ this startup check is bypassed.  I guess I am requesting that some form of record/entry checking be added to the code to prevent issues which might make DNS completely unavailable.

Thoughts?

--M

--
Michael J. Ayers
Senior Systems Engineer


------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers




--
Michael J. Ayers
Senior Systems Engineer
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers
Scott Haneda | 1 Oct 2010 00:49
Favicon

Re: BIND-DLZ cache corruption issue.

On Sep 30, 2010, at 9:43 AM, "Michael J. Ayers" <ayerslists <at> gmail.com> wrote:

Positive.  You can probably replicate this yourself.  We are using BIND 6.5.1-3.P3 and 9.6.1-16-P3 here.  Add a zone for example.com (or other) and add an MX host to it then add a TXT record for SPF similar to the following:

You can mix and match with DLZ in that some zones can come out of a database like MySql or PGSql, and others can use the original text file method that most all non DB backed NS's use. 

I would be curious for you to replicate the zone problem to a text file based zone. Create the same format error with the TXT records string length being too long. 

I am guessing what will happen is named-checkzone/named-checkconf will one or the other report errors. rndc will probably fail a reload, and perhaps named will not start/restart either. 

I wonder what the results will be. 

It's a tough call on the right thing to do. Badly formatted data in named seems to go through a set of checks that if severe enough, named won't start. That seems logical. 

But by adding DLZ, those checks can no longer happen, since the zone is loaded dynamically. I think ideally I would want the entire zone to be skipped, and the zone skipping logged. 

If I remember correct, you were getting crashes or a dead/stalled server. If that doesn't happen with non DLZ settings, I think the DLZ side should be changed to behave the sane way. 

In the meantime, how about a simple set of checks on the server side to validate? You can then check strlen() and also do other nice things like make sure there is a trailing dot when needed, or not. Domain validation and checking that domain is registered, has it's NS's correctly pointing to you and backups etc. 

It seems to me, regardless of what named does with the data, it's a good idea to validate that data first. 
-- 
Scott * If you contact me off list replace talklists <at> with scott <at> *

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers
Michael J. Ayers | 1 Oct 2010 05:46
Picon

Re: BIND-DLZ cache corruption issue.

This particular issue is cannot be replicated with a flat text zone file.  That issue was fixed with I believe the 9.5 release.  BIND now properly throws an error when it encounters a TXT record that is longer than 255 bytes in length or is not properly closed in quotations.  So no, this issue is not repeatable outside of DLZ.  However the zone that this issue happened on is a DLZ based zone.  Even if I were to mix and match some zones in DLZ and some in flat text files, if the record loads out of the DLZ and is too long/truncated it causes the issues for every other zone being served because the global cache is corrupt.  It's not a massive problem because most organizations will never have an SPF record that large or they will concatenate two from seperate zones together with a proper include (which has other negative affects).  That being said, its pretty bad that it effectively takes down the entire service until the issue is located and resolved.


My suggestion/request is that if this issue can be replicated in house (which I suspect will not be a problem considering I can cause it at will here) that each entry be checked for validity when read and if one fails, it rejects the record and does not load it, throwing an error to the log and continuing normal operation.  Either that or it force truncates it and closes off the entry with proper quotes (which I think is a bad way to go).

--M

2010/9/30 Scott Haneda <talklists <at> newgeo.com>
On Sep 30, 2010, at 9:43 AM, "Michael J. Ayers" <ayerslists <at> gmail.com> wrote:

Positive.  You can probably replicate this yourself.  We are using BIND 6.5.1-3.P3 and 9.6.1-16-P3 here.  Add a zone for example.com (or other) and add an MX host to it then add a TXT record for SPF similar to the following:

You can mix and match with DLZ in that some zones can come out of a database like MySql or PGSql, and others can use the original text file method that most all non DB backed NS's use. 

I would be curious for you to replicate the zone problem to a text file based zone. Create the same format error with the TXT records string length being too long. 

I am guessing what will happen is named-checkzone/named-checkconf will one or the other report errors. rndc will probably fail a reload, and perhaps named will not start/restart either. 

I wonder what the results will be. 

It's a tough call on the right thing to do. Badly formatted data in named seems to go through a set of checks that if severe enough, named won't start. That seems logical. 

But by adding DLZ, those checks can no longer happen, since the zone is loaded dynamically. I think ideally I would want the entire zone to be skipped, and the zone skipping logged. 

If I remember correct, you were getting crashes or a dead/stalled server. If that doesn't happen with non DLZ settings, I think the DLZ side should be changed to behave the sane way. 

In the meantime, how about a simple set of checks on the server side to validate? You can then check strlen() and also do other nice things like make sure there is a trailing dot when needed, or not. Domain validation and checking that domain is registered, has it's NS's correctly pointing to you and backups etc. 

It seems to me, regardless of what named does with the data, it's a good idea to validate that data first. 
-- 
Scott * If you contact me off list replace talklists <at> with scott <at> *


------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers




--
Michael J. Ayers
Senior Systems Engineer
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers
Graeme Fowler | 1 Oct 2010 10:45

Re: BIND-DLZ cache corruption issue.

On Thu, 2010-09-30 at 20:46 -0700, Michael J. Ayers wrote:
> This particular issue is cannot be replicated with a flat text zone
> file.  That issue was fixed with I believe the 9.5 release.  BIND now
> properly throws an error when it encounters a TXT record that is
> longer than 255 bytes in length or is not properly closed in
> quotations.

So...

Set the database schema such that TXT records have a maximum length of
253 - char(253) in SQL parlance - and then ensure that the query BIND
makes through the DLZ driver has escaped quotes at either end. That
assumes that the quotes are included in the character string length as
defined in the RFC for TXT records (1034 or 1035 IIRC).

That way you eliminate the possibility of corrupt data as described. It
may still be possible to malform the data, but not in the way you
describe.

Graeme

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
Michael J. Ayers | 1 Oct 2010 18:14
Picon

Re: BIND-DLZ cache corruption issue.

Hey Graeme,


That might be an option, I just think it is the wrong way to handle the issue for a couple of reasons.  First, DLZ implementation is not documented in this way.  That means it would fix the issue for myself and the organization I work for but wouldn't resolve the issue for new or current adopters who are not paying attention to this thread.  If the documentation were to change I wouldn't feel this way, but I haven't seen a substantial change in 3-4 years, so I wont hold my breath.  Second, you cannot just limit TXT records in this way.  The data for the record is stored in the 'data' field just like every other record.  While no other records currently approach this length, that doesn't mean that a future one will not.  So it could be considered an artificial limitation to restrict the entire data field in this manner when I might have to open it up again in two years because DNSSEC implemented another record type.  It also means a one-off departure, again from the documented implementation, that I have to individually document here for those who pick up any work I might leave behind in the future.  I could go on and on.

My opinion, and it is just that, is that the proper way to resolve this is in the code itself to gracefully handle this type of issue.  It does not seem like it would be a complex modification and I think the gains of implementing it in this way far outweigh any potential losses.  The developers will have to speak to that though.  I am however more than willing to be as involved in this process as a guinea pig, tester, etc. as they see fit.

Thanks,

Mike

On Fri, Oct 1, 2010 at 1:45 AM, Graeme Fowler <graeme <at> graemef.net> wrote:
On Thu, 2010-09-30 at 20:46 -0700, Michael J. Ayers wrote:
> This particular issue is cannot be replicated with a flat text zone
> file.  That issue was fixed with I believe the 9.5 release.  BIND now
> properly throws an error when it encounters a TXT record that is
> longer than 255 bytes in length or is not properly closed in
> quotations.

So...

Set the database schema such that TXT records have a maximum length of
253 - char(253) in SQL parlance - and then ensure that the query BIND
makes through the DLZ driver has escaped quotes at either end. That
assumes that the quotes are included in the character string length as
defined in the RFC for TXT records (1034 or 1035 IIRC).

That way you eliminate the possibility of corrupt data as described. It
may still be possible to malform the data, but not in the way you
describe.

Graeme


------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers



--
Michael J. Ayers
Senior Systems Engineer
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Bind-dlz-testers mailing list
Bind-dlz-testers <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bind-dlz-testers

Gmane