Robert Fleming | 1 Nov 10:50 2005

Re: Re: PostgreSQL + non-ASCII encoding

Thomas,

Is the expense concern the computational complexity (escaping for SQL), 
the primary storage use (malloc'ing 5*READ_BLOCK_SIZE), or secondary 
storage use (not exploiting PostgreSQL's text compression)?  Perhaps the 
first two of these could be addressed by using PostgreSQL's 
parameterized query execution functions.  As for secondary storage use, 
my understanding is that both "text" and "bytea" can be stored 
compressed and uncompressed (at the operator's discretion).

The patch I sent before was indeed sub-optimal w.r.t. performance, but 
fortunately was optimal w.r.t. my level of effort, portable among dbmail 
revisions, and adequate w.r.t. my performance requirements (I am my DB's 
sole user ;).

Regarding message body searching: it seems to me that searching a "text" 
column has the same complexities as searching "bytea".  In both cases, 
it would be necessary to transcode the search string (or the message 
block) to matching encodings.  (As you mentioned, switching entirely 
UTF-8 would optimize for search -- at the expense of most other operations.)

Regards,
Robert

P.S. My selfish desire is for dbmail to not compel a specific DB 
encoding, because I'm combining dbmail and other data in the same DB -- 
I've patched dbmail to segregate its tables into its own schema.

Thomas Mueller wrote:

(Continue reading)

Ryo HAYAKAWA | 1 Nov 11:43 2005
Picon

Re: Re: PostgreSQL + non-ASCII encoding

Hello,

I made a similar patch before.

http://mailman.fastxs.net/pipermail/dbmail-dev/2005-September/007531.html

--
Ryo Hayakawa
ryo@...
Thomas Mueller | 1 Nov 20:25 2005

Re: PostgreSQL + non-ASCII encoding

Hi Robert,

> Is the expense concern the computational complexity (escaping for SQL),
> the primary storage use (malloc'ing 5*READ_BLOCK_SIZE), or secondary
> storage use (not exploiting PostgreSQL's text compression)?  Perhaps the
> first two of these could be addressed by using PostgreSQL's
> parameterized query execution functions.  As for secondary storage use,
> my understanding is that both "text" and "bytea" can be stored
> compressed and uncompressed (at the operator's discretion).

I only meant the storage requirements.

Datatype 'text'
http://www.postgresql.org/docs/8.0/interactive/datatype-character.html:
"Long strings are compressed by the system automatically, so the
physical requirement on disk may be less."

Datatype 'binary'
http://www.postgresql.org/docs/8.0/interactive/datatype-binary.html says
nothing about compression. Probably because the size reduction/cpu cost
is much worse than for text fields.

Unfortunately Moores Law doesn't apply to disc io, there is only very
little improvement every year, so io becomes more and more a bottleneck.
That's why it's a good idea to increase cpu usage to lower io (compression).

> The patch I sent before was indeed sub-optimal w.r.t. performance, but
> fortunately was optimal w.r.t. my level of effort, portable among dbmail
> revisions, and adequate w.r.t. my performance requirements (I am my DB's
> sole user ;).
(Continue reading)

Robert Fleming | 2 Nov 00:11 2005

Re: Re: PostgreSQL + non-ASCII encoding

Thomas,

>I only meant the storage requirements.
>
>Datatype 'text'
>http://www.postgresql.org/docs/8.0/interactive/datatype-character.html:
>"Long strings are compressed by the system automatically, so the
>physical requirement on disk may be less."
>
>Datatype 'binary'
>http://www.postgresql.org/docs/8.0/interactive/datatype-binary.html says
>nothing about compression. Probably because the size reduction/cpu cost
>is much worse than for text fields.
>
>Unfortunately Moores Law doesn't apply to disc io, there is only very
>little improvement every year, so io becomes more and more a bottleneck.
>That's why it's a good idea to increase cpu usage to lower io (compression).
>  
>
Both "text" and "bytea" are so-called "TOAST" types in PostgreSQL.  
TOAST types have some flexibility in storage strategy; have a look at 
http://www.postgresql.org/docs/8.0/interactive/storage-toast.html

I just did a little experiment to try it out: inserting long, 
compressible strings in sql_ascii and unicode databases, using storage 
types "extended" (i.e. compressed) and "external" (not compressed):

sql_ascii
  bytea
    extended
(Continue reading)

bugtrack | 3 Nov 10:53 2005

[DBMail 0000279]: Double quote in subject unescaped when FETCH ENVELOPE


The following issue has been SUBMITTED. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=279 
====================================================================== 
Reported By:                kc
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   279
Category:                   IMAP daemon
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     new
====================================================================== 
Date Submitted:             03-Nov-05 10:53 CET
Last Modified:              03-Nov-05 10:53 CET
====================================================================== 
Summary:                    Double quote in subject unescaped when FETCH
ENVELOPE
Description: 
If the subject contains double quote("), at least for the FETCH ENVELOPE,
it's unescaped.  So a subject like
    The file "about me"
end up returns response like
    "The file "about me""

The imap client is confused about token "about" as it sees "The file " as
subject instead of "The file \"about me\"".
(Continue reading)

bugtrack | 3 Nov 11:02 2005

[DBMail 0000279]: Double quote in subject unescaped when FETCH ENVELOPE


The following issue has been ACKNOWLEDGED. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=279 
====================================================================== 
Reported By:                kc
Assigned To:                paul
====================================================================== 
Project:                    DBMail
Issue ID:                   279
Category:                   IMAP daemon
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     acknowledged
====================================================================== 
Date Submitted:             03-Nov-05 10:53 CET
Last Modified:              03-Nov-05 11:02 CET
====================================================================== 
Summary:                    Double quote in subject unescaped when FETCH
ENVELOPE
Description: 
If the subject contains double quote("), at least for the FETCH ENVELOPE,
it's unescaped.  So a subject like
    The file "about me"
end up returns response like
    "The file "about me""

The imap client is confused about token "about" as it sees "The file " as
subject instead of "The file \"about me\"".
(Continue reading)

igor | 5 Nov 13:41 2005

dbmail 2.1.x patch to support lmtp email+mailbox <at>

Hi,

Is there any patch for dbmail 2.1.x to support lmtp email+mailbox <at>  ?
bugtrack | 6 Nov 07:55 2005

[DBMail 0000280]: Encoded subject causes db error when save data to dbmail_subjectfield as length exceeds 100


The following issue has been SUBMITTED. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=280 
====================================================================== 
Reported By:                kc
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   280
Category:                   Database layer
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     new
====================================================================== 
Date Submitted:             06-Nov-05 07:55 CET
Last Modified:              06-Nov-05 07:55 CET
====================================================================== 
Summary:                    Encoded subject causes db error when save data to
dbmail_subjectfield as length exceeds 100
Description: 
table dbmail_subjectfield has only 100 char for subjectfield.  In the case
the subject is in utf-8, it need to be encoded which makes it larger than
expected.

Suggest to use "text" instead of varchar(100).  I don't know if there is
any RFC limits subject length line though.

And, shouldn't the subject line be decoded when save it to
(Continue reading)

bugtrack | 6 Nov 14:19 2005

[DBMail 0000280]: Encoded subject causes db error when save data to dbmail_subjectfield as length exceeds 100


A NOTE has been added to this issue. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=280 
====================================================================== 
Reported By:                kc
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   280
Category:                   Database layer
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     new
====================================================================== 
Date Submitted:             06-Nov-05 07:55 CET
Last Modified:              06-Nov-05 14:19 CET
====================================================================== 
Summary:                    Encoded subject causes db error when save data to
dbmail_subjectfield as length exceeds 100
Description: 
table dbmail_subjectfield has only 100 char for subjectfield.  In the case
the subject is in utf-8, it need to be encoded which makes it larger than
expected.

Suggest to use "text" instead of varchar(100).  I don't know if there is
any RFC limits subject length line though.

And, shouldn't the subject line be decoded when save it to
(Continue reading)

bugtrack | 6 Nov 23:01 2005

[DBMail 0000280]: Encoded subject causes db error when save data to dbmail_subjectfield as length exceeds 100


A NOTE has been added to this issue. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=280 
====================================================================== 
Reported By:                kc
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   280
Category:                   Database layer
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     new
====================================================================== 
Date Submitted:             06-Nov-05 07:55 CET
Last Modified:              06-Nov-05 23:01 CET
====================================================================== 
Summary:                    Encoded subject causes db error when save data to
dbmail_subjectfield as length exceeds 100
Description: 
table dbmail_subjectfield has only 100 char for subjectfield.  In the case
the subject is in utf-8, it need to be encoded which makes it larger than
expected.

Suggest to use "text" instead of varchar(100).  I don't know if there is
any RFC limits subject length line though.

And, shouldn't the subject line be decoded when save it to
(Continue reading)


Gmane