Matthias Andree | 1 Nov 02:31 2004
Picon
Picon

Re: QDBM and transactions

Stefan Bellon <sbellon <at> sbellon.de> writes:

> BTW: Nobody has commented to my migration program. I used it to migrate
> my 27 MB Depot database to a 7 MB Villa database with the same content.
> It worked and is quicker now. Oh, and the hard disc is _much_ quieter
> during Bogofilter activities with Villa in contrast to Depot.

I've polished it and added it as a first-class citizen called
bogoQDBMupgrade in the bogofilter distribution.

The most important changes were:

1. the cmpkey function has seen an incompatible change to align QDBM and
   DB dump order - only affects people who have used the new Villa API
   version. dump/load should fix this.

   I wanted to "breakpair" this before 0.93.0 proper.

2. the program now checks if the output is already in B+tree format.

   Because the latter is just furniture inside the Hash format, dpopen
   will happily open a B+tree data base and dump B+tree pages rather
   than bogofilter wordlist records, nuking the user's data base.

3. the cmpkey function was split out into a separate file so we only
   have one central copy around.

--

-- 
Matthias Andree
_______________________________________________
(Continue reading)

Stefan Bellon | 1 Nov 09:16 2004
Picon

Re: QDBM and transactions

Matthias Andree wrote:

[snip]

> The most important changes were:

> 1. the cmpkey function has seen an incompatible change to align QDBM
>    and DB dump order - only affects people who have used the new
>    Villa API version. dump/load should fix this.

I don't completely understand. What are your changes? Could you post
your cmpkey?

> 2. the program now checks if the output is already in B+tree format.

>    Because the latter is just furniture inside the Hash format, dpopen
>    will happily open a B+tree data base and dump B+tree pages rather
>    than bogofilter wordlist records, nuking the user's data base.

I thought about this as well but as this is only relevant if somebody
moves from 0.93.0 to an older version I thought it wouldn't be that
important. If you want, you can always corrupt your database anyway.
But if you added the check: nice.

> 3. the cmpkey function was split out into a separate file so we only
>    have one central copy around.

Yes, that's something I wanted to do as well becaus of my background[1]
but as I wasn't sure whether you accepted my migration program I
couldn't do it.
(Continue reading)

Stefan Bellon | 1 Nov 09:40 2004
Picon

Re: QDBM and transactions

Stefan Bellon wrote:
> Matthias Andree wrote:

> > 1. the cmpkey function has seen an incompatible change to align QDBM
> >    and DB dump order - only affects people who have used the new
> >    Villa API version. dump/load should fix this.

> I don't completely understand. What are your changes? Could you post
> your cmpkey?

Ok, I have seen your cmpkey in 0.92.100.cvs now. You assign values
between incompatible types:

int cmpkey(const char *aptrin, int asiz, const char *bptrin, int bsiz)
{
    int aiter, biter;
    const unsigned char *aptr = aptrin;
    const unsigned char *bptr = bptrin;

This does work with GCC but not with compilers really strict adhering
to the C standard (Norcroft C on RISC OS e.g.).

I asked this question once to one of the Norcroft C maintainers and got
this answer back (with permission to quote it):

-----BEGIN QUOTE-----
Okay. "6.5.16.1 Simple Assignment" in C99 gives the list of
possibilities for types in a simple assignment. This assignment can
only possibly fall into the case:

(Continue reading)

Pavel Kankovsky | 1 Nov 10:58 2004
Picon

Re: QDBM and transactions

On Fri, 29 Oct 2004, Matthias Andree wrote:

> Stefan Bellon <sbellon <at> sbellon.de> writes:
> 
> > I've just had a look at the QDBM documentation. We currently use the
> > Depot API in QDBM. But there exists another one: the Villa API. And the
> > Villa API has transaction support. Perhaps we can indeed maintain a
> > transactional QDBM together with a BerkeleyDB one?
> 
> Feel free to take a stab and implement a transactional interface for
> QDBM; if you have any questions, this list is the right place to ask.
> 
> If you need a crash detector (because QDBM doesn't by itself figure when
> it needs to recover/rollback, Berkeley DB doesn't), check db_lock.c and
> how it's integrated in datastore_db.c.
> 

> We're effectively using an outer lock layer (traditional fcntl locked
> file) and an inner lock layer (special character cell based locking
> process tracker developped by Pavel and me little) which is currently
> tailored for Berkeley DB's needs but can be used by other data bases,
> too, if needed. The inner layer requires record locks, not sure if OS/2
> or RiscOS can provide these.

Matthias is giving me too much credit, he did all the hard work.

But let's get back to the topic: the locking mechanism depends on

1. one tri-state (unlocked/shared/exclusive) lock
   to guarantee exclusive access to recovery jobs et al (this is
(Continue reading)

Matthias Andree | 1 Nov 11:18 2004
Picon
Picon

Re: QDBM and transactions

Stefan Bellon <sbellon <at> sbellon.de> writes:

>    Irrespective of the choice made [whether char is signed or unsigned],
>    char is a separate type from the other two and is not compatible with
>    either.

> So, if you want aptr and bptr to be of type (unsigned char *) and
> aptrin and bptrin are of type (char *), then you _have_ to cast them:

Right.

>
>     const unsigned char *aptr = (unsigned char *) aptrin;
>     const unsigned char *bptr = (unsigned char *) bptrin;

The cast would have to be unsigned const char * so as to avoid a
qualifier removal warning. Fixed now in CVS.

--

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

Matthias Andree | 1 Nov 11:26 2004
Picon
Picon

Re: QDBM and transactions

Stefan Bellon <sbellon <at> sbellon.de> writes:

>> 2. the program now checks if the output is already in B+tree format.
>
>>    Because the latter is just furniture inside the Hash format, dpopen
>>    will happily open a B+tree data base and dump B+tree pages rather
>>    than bogofilter wordlist records, nuking the user's data base.
>
> I thought about this as well but as this is only relevant if somebody
> moves from 0.93.0 to an older version I thought it wouldn't be that
> important. If you want, you can always corrupt your database anyway.
> But if you added the check: nice.

Well, db_open might want this check with reversed sign, too -- now that
you mention it. The problem is running the converter twice, for that
will totally trash the data base without reporting an error.

>> 3. the cmpkey function was split out into a separate file so we only
>>    have one central copy around.
>
> Yes, that's something I wanted to do as well becaus of my background[1]
> but as I wasn't sure whether you accepted my migration program I
> couldn't do it.

Just ask next time. :)

--

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
(Continue reading)

Matthias Andree | 1 Nov 18:39 2004
Picon
Picon

Re: QDBM and transactions

"Pavel Kankovsky" <peak <at> argo.troja.mff.cuni.cz> writes:

> BTW: Matthias, you should avoid stdio calls in a signal handler. Most libc
> routines, including stdio, are not reentrant.

Gee, right, and malloc isn't reentrant either. :-(

Thank you.

I'll see to that now, and we should fix this before 0.93.0.

--

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

Matthias Andree | 2 Nov 03:33 2004
Picon
Picon

Re: QDBM and transactions

"Pavel Kankovsky" <peak <at> argo.troja.mff.cuni.cz> writes:

> BTW: Matthias, you should avoid stdio calls in a signal handler. Most libc
> routines, including stdio, are not reentrant.

I believe I've fixed this now. Could you have another look at the
current db_lock.c file to see if the signal handler is safe now?

I'm attaching the file for your convenience.

-- 
Matthias Andree
/** \file db_lock.c
 * \brief Lock handler to detect application crashes
 * \author Matthias Andree
 * \date 2004
 *
 * GNU GPL v2
 * with optimization ideas by Pavel Kankovsky
 *
 * \attention
 * This code uses signal handlers and must pay extra attention to
 * reentrancy!
 *
 * \par Lock file layout:
 * the lock file has a list of cells, which can be either 0 or 1.
 * - 0 means: slot is free
 * - 1 means: slot in use
(Continue reading)

.rp | 4 Nov 20:21 2004

Re: [Full-Disclosure] bogofilter-SA-2004-01

On 30 Oct 2004 at 15:22, Matthias Andree wrote:
> The pertinent change allowed the quoted-printable decoder to accept LF
> in encoded words but replaced it by a NUL character, which the calling
> function inside bogofilter could not handle. It attempted to write a NUL
> byte either one byte past the end of a buffer provided by the lexical
> analyzer or to an address that was the negative of the address of the
> first byte of the "encoded text" part of the encoded word that was
> supposed to be decoded.
> 
I apologize in advance for asking, but why not 
	set a flag when an encoded word is encountered 
	if LF comes up replace it with ~ 
	when encoded word ends set flag off.

_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev

Matthias Andree | 5 Nov 01:05 2004
Picon
Picon

Re: bogofilter-SA-2004-01

On Thu, 04 Nov 2004, .rp wrote:

> I apologize in advance for asking, but why not 
> 	set a flag when an encoded word is encountered 
> 	if LF comes up replace it with ~ 
> 	when encoded word ends set flag off.

No need to apologize for good questions.

An encoded word as per RFC-2047 does not contain line feed characters,
so we should not accept or attempt to decode them.

In fact, if we decoded a nonconformant string that closely resembles an
encoded word, we'd discard that bit of information that there was a
broken RFC-2047 encoded word. The decoded word carries no such entropy.

Example:

An intact RFC-2047 encoded word such as

    Test-Header: =?iso-8859-1?q?n=E4h_b=e4h?=

yields

    get_token: 1 "head:Test-Header"
    get_token: 1 "head:näh"
    get_token: 1 "head:bäh"

Test-Header: =?iso-8859-1?q?n=E4h b=e4h?= (same with embedded LF)

(Continue reading)


Gmane