Matthias Andree | 9 Feb 00:14 2005
Picon
Picon

tackling sqlite byte-order problem (was: sqlite byte-order)

Fletcher reported an endianness bug vs. the sqlite backend.

> Shouldn't a sqlite database created on a sparc be readable from an intel?
> It looks to me like something in the sqlite interface is not byte swapping.
>
> gaos.cs.utexas.edu$ bogofilter </dev/null
> Invalid robx value (2379.220992).  Must be between 0.0 and 1.0
> gaos.cs.utexas.edu$ 
>
> gaos.cs.utexas.edu$ bogoutil -d wordlist.db | head
> .WORDLIST_VERSION 885731585 0 535900417

[...]

This should have been .WORDLIST_VERSION 20040500 0 20050207.

The bug is that sqlite3 always returns false when datastore queries the
"is swapped" state, but this problem is not sqlite specific and may also
apply to TDB (I haven't checked if the TDB on-disk structures are
independent of endianness).

How do we best tackle this?

I wonder if we should add a new token .ENDIAN32 (from datastore) that
stores the hex value 0x01020304 in natural byte order and use that to
figure if the database is byteswapped.

Any better ideas on the market?

I have found no means to let sqlite3 return if the database is in
(Continue reading)

David Relson | 9 Feb 01:10 2005

Re: tackling sqlite byte-order problem (was: sqlite byte-order)

On Wed, 09 Feb 2005 00:14:39 +0100
Matthias Andree wrote:

> Fletcher reported an endianness bug vs. the sqlite backend.
> 
> > Shouldn't a sqlite database created on a sparc be readable from an intel?
> > It looks to me like something in the sqlite interface is not byte swapping.
> >
> > gaos.cs.utexas.edu$ bogofilter </dev/null
> > Invalid robx value (2379.220992).  Must be between 0.0 and 1.0
> > gaos.cs.utexas.edu$ 
> >
> > gaos.cs.utexas.edu$ bogoutil -d wordlist.db | head
> > .WORDLIST_VERSION 885731585 0 535900417
> 
> [...]
> 
> This should have been .WORDLIST_VERSION 20040500 0 20050207.
> 
> The bug is that sqlite3 always returns false when datastore queries the
> "is swapped" state, but this problem is not sqlite specific and may also
> apply to TDB (I haven't checked if the TDB on-disk structures are
> independent of endianness).
> 
> How do we best tackle this?
> 
> I wonder if we should add a new token .ENDIAN32 (from datastore) that
> stores the hex value 0x01020304 in natural byte order and use that to
> figure if the database is byteswapped.

(Continue reading)

Matthias Andree | 9 Feb 10:52 2005
Picon
Picon

Re: tackling sqlite byte-order problem

David Relson <relson <at> osagesoftware.com> writes:

>> I wonder if we should add a new token .ENDIAN32 (from datastore) that
>> stores the hex value 0x01020304 in natural byte order and use that to
>> figure if the database is byteswapped.
>
> .ENDIAN32 sounds fine to me.  Clearly sqlite needs it.  Is it of value
> for the other DB's we support?
>
> Alternatively, since we _know_ that .WORDLIST_VERSION is of form
> YYYYMMDD, we can split it into YYYY and MMDD and know that YYYY>MMDD is
> correct while MMDD>YYYY indicates is_swapped.

Is YYYYMMDD stored in BCD format? That's news to me, I thought it was in
digital, and in that case, the distinction between YYYY and MMDD doesn't
fall on a byte boundary.

--

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

David Relson | 9 Feb 13:39 2005

Re: tackling sqlite byte-order problem

On Wed, 09 Feb 2005 10:52:46 +0100
Matthias Andree wrote:

> David Relson <relson <at> osagesoftware.com> writes:
> 
> >> I wonder if we should add a new token .ENDIAN32 (from datastore) that
> >> stores the hex value 0x01020304 in natural byte order and use that to
> >> figure if the database is byteswapped.
> >
> > .ENDIAN32 sounds fine to me.  Clearly sqlite needs it.  Is it of value
> > for the other DB's we support?
> >
> > Alternatively, since we _know_ that .WORDLIST_VERSION is of form
> > YYYYMMDD, we can split it into YYYY and MMDD and know that YYYY>MMDD is
> > correct while MMDD>YYYY indicates is_swapped.
> 
> Is YYYYMMDD stored in BCD format? That's news to me, I thought it was in
> digital, and in that case, the distinction between YYYY and MMDD doesn't
> fall on a byte boundary.

My bad.  It's binary, not BCD.  Splitting would be "divide by 10000" and
check for reasonable results.  As a sanity check I wrote the little
program I've attached.  It does an endian swap then prints the dates and
their yyyy and mmdd components.  Perhaps it's useful; perhaps it's not.

Run like so:

gcc -o yyyymmdd yyyymmdd.c ; yyyymmdd ; yyyymmdd 20020820 ; yyyymmdd 21050209

David
(Continue reading)

Matthias Andree | 10 Feb 02:02 2005
Picon
Picon

Re: tackling sqlite byte-order problem

David Relson <relson <at> osagesoftware.com> writes:

> My bad.  It's binary, not BCD.  Splitting would be "divide by 10000" and
> check for reasonable results.

Yup. I'd think we'd better add .ENDIAN32 at creation time to remove
hunting for dates. The YYYY must be >= 2002 (first bogofilter release)
and the MMDD <= 1231. We can gather dates almost everywhere, token
timestamp, .WORDLIST_VERSION, we can sanity check .ROBX, but whenever
the database is open for update, we'd better place an .ENDIAN32 tag for
efficiency.

> As a sanity check I wrote the little
> program I've attached.  It does an endian swap then prints the dates and
> their yyyy and mmdd components.  Perhaps it's useful; perhaps it's not.

--

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

David Relson | 10 Feb 02:31 2005

Re: tackling sqlite byte-order problem

On Thu, 10 Feb 2005 02:02:34 +0100
Matthias Andree wrote:

> David Relson <relson <at> osagesoftware.com> writes:
> 
> > My bad.  It's binary, not BCD.  Splitting would be "divide by 10000" and
> > check for reasonable results.
> 
> Yup. I'd think we'd better add .ENDIAN32 at creation time to remove
> hunting for dates. The YYYY must be >= 2002 (first bogofilter release)
> and the MMDD <= 1231. We can gather dates almost everywhere, token
> timestamp, .WORDLIST_VERSION, we can sanity check .ROBX, but whenever
> the database is open for update, we'd better place an .ENDIAN32 tag for
> efficiency.

It seems that this is needed only for sqlite3, because the other
databases support is_swapped(), right?
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

Matthias Andree | 10 Feb 02:46 2005
Picon
Picon

Re: tackling sqlite byte-order problem

David Relson <relson <at> osagesoftware.com> writes:

>> Yup. I'd think we'd better add .ENDIAN32 at creation time to remove
>> hunting for dates. The YYYY must be >= 2002 (first bogofilter release)
>> and the MMDD <= 1231. We can gather dates almost everywhere, token
>> timestamp, .WORDLIST_VERSION, we can sanity check .ROBX, but whenever
>> the database is open for update, we'd better place an .ENDIAN32 tag for
>> efficiency.
>
> It seems that this is needed only for sqlite3, because the other
> databases support is_swapped(), right?

I'm not sure about TDB and QDBM. QDBM appears not to support sharing
database files across endianness, TDB is virtually undocumented and
documentation and code are notoriously out of synch.

I don't want to make a sqlite3-local solution if the code can be reused
for another DB backend.

--

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

David Relson | 10 Feb 02:48 2005

Re: tackling sqlite byte-order problem

On Thu, 10 Feb 2005 02:46:55 +0100
Matthias Andree wrote:

> David Relson <relson <at> osagesoftware.com> writes:
> 
> >> Yup. I'd think we'd better add .ENDIAN32 at creation time to remove
> >> hunting for dates. The YYYY must be >= 2002 (first bogofilter release)
> >> and the MMDD <= 1231. We can gather dates almost everywhere, token
> >> timestamp, .WORDLIST_VERSION, we can sanity check .ROBX, but whenever
> >> the database is open for update, we'd better place an .ENDIAN32 tag for
> >> efficiency.
> >
> > It seems that this is needed only for sqlite3, because the other
> > databases support is_swapped(), right?
> 
> I'm not sure about TDB and QDBM. QDBM appears not to support sharing
> database files across endianness, TDB is virtually undocumented and
> documentation and code are notoriously out of synch.
> 
> I don't want to make a sqlite3-local solution if the code can be reused
> for another DB backend.

Fair enough.  One extra symbol and a dozen lines of code (estimated) do
not a problem make.

_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

(Continue reading)

David Relson | 25 Feb 16:20 2005

Re: 0.93.6 vs 0.94.0

Matthias,

I'm shifting this private thread to bogofilter-dev since I think it's
become of interest to the user base. 

On Fri, 25 Feb 2005 11:16:48 +0100 Matthias Andree wrote:

> On Thu, 24 Feb 2005, David Relson wrote:
> 
> > In my thoughts, the new release will be 0.94.0, rather than 0.93.6 because:
> > 
> >  1. the new run-time selection and auto-detection capabilities are significant
> >  2. the default mode will be non-transaction.
> > 
> > With a bit of history included, the reasoning is:
> 
> OK, some other bit of history reflected:
> 
> - we have traditionally released a new version (0.94.X) when we broke
>   compatibility, creating additional upgrade requirements and such
> 
> We are not breaking compatibility with the recent changes, so there is
> no reason to bump the minor revision, we can just bump patchlevel and be
> done.

Changing the default from transaction to non-transaction is my primary
reason for bumping the minor revision.

> > Bogofilter's 0.93.x series introduced 3 related changes:
> > 
(Continue reading)

.rp | 25 Feb 19:02 2005

Re: 0.93.6 vs 0.94.0

> > Our users are concerned with managability, and I think if we make
> > them read the release notes of a version never released, they'll
> > jump at us, and with good reason. That we changed things (like
> > removing DB 3.1 support) from 0.93.J to 0.93.K is no big deal as
> > long as we don't do that again after a stable 0.93.M release.
> 
> One of our problems is that we can't make _read_ the release notes,
> even for the current release!  There are still people who're running
> old versions, for example 0.17.5.  When they upgrade, there is a lot
> of reading that they _should_ do.  Having info on a "never stabilized"
> version is a minor concern.

Well as one of those still using 0.1x , I would say that it would be very 
usefull if the ReleaseNotes where made available separate from the whole 
dang package AND if it was better written for those who are not super 
technical manual readers. (I read the notes for .91.x and was so confused 
and bewildred by the language and layout that it simply froze me out).

Maybe a completely document would better serve the BF community.

_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev


Gmane