Michael Parker | 1 Aug 2005 06:04
Picon
Favicon

Re: Performance problems testing with Spamassassin 3.1.0

Hi All,

As a SpamAssassin developer, who by my own admission has real problem
getting PostgreSQL to work well, I must thank everyone for their
feedback on this issue.  Believe me when I say what is in the tree now
is a far cry from what used to be there, orders of magnitude faster
for sure.  I think there are several good ideas that have come out of
this thread and I've set about attempting to implement them.

Here is a version of the stored procedure, based in large part by the
one written by Tom Lane, that accepts and array of tokens and loops
over them to either update or insert them into the database (I'm not
including the greatest_int/least_int procedures but you've all seen
them before):

CREATE OR REPLACE FUNCTION put_tokens(inuserid INTEGER,
                                     intokenary BYTEA[],
                                     inspam_count INTEGER,
                                     inham_count INTEGER,
                                     inatime INTEGER)
RETURNS VOID AS ' 
DECLARE
  _token BYTEA;
  new_tokens INTEGER := 0;
BEGIN
  for i in array_lower(intokenary, 1) .. array_upper(intokenary, 1)
  LOOP
    _token := intokenary[i];
    UPDATE bayes_token
       SET spam_count = greatest_int(spam_count + inspam_count, 0),
(Continue reading)

Tom Lane | 1 Aug 2005 06:42
Picon

Re: Performance problems testing with Spamassassin 3.1.0

Michael Parker <parkerm <at> pobox.com> writes:
> The next hurdle, and I've just posted to the DBD::Pg list, is
> escaping/quoting the token strings.

If you're trying to write a bytea[] literal, I think the most reliable
way to write the individual bytes is
	\\\\nnn
where nnn is *octal*.  The idea here is:
	* string literal parser takes off one level of backslashing,
	  leaving \\nnn
	* array input parser takes off another level, leaving \nnn
	* bytea input parser knows about backslashed octal values

Note it has to be 3 octal digits every time, no abbreviations.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Tom Lane | 1 Aug 2005 06:53
Picon

Re: Performance problems testing with Spamassassin 3.1.0

Michael Parker <parkerm <at> pobox.com> writes:
> sub bytea_esc {
>   my ($str) =  <at> _;
>   my $buf = "";
>   foreach my $char (split(//,$str)) {
>     if (ord($char) == 0) { $buf .= "\\\\000"; }
>     elsif (ord($char) == 39) { $buf .= "\\\\047"; }
>     elsif (ord($char) == 92) { $buf .= "\\\\134"; }
>     else { $buf .= $char; }
>   }
>   return $buf;
> }

Oh, I see the problem: you forgot to convert " to a backslash sequence.

It would probably also be wise to convert anything >= 128 to a backslash
sequence, so as to avoid any possible problems with multibyte character
encodings.  You wouldn't see this issue in a SQL_ASCII database, but I
suspect it would rise up to bite you with other encoding settings.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Tom Lane | 1 Aug 2005 18:32
Picon

Re: [PATCHES] COPY FROM performance improvements

"Luke Lonergan" <llonergan <at> greenplum.com> writes:
> On 7/29/05 5:37 AM, "Bruce Momjian" <pgman <at> candle.pha.pa.us> wrote:
>> Where is the most recent version of the COPY patch?

> I've attached it here, sorry to the list owner for the patch inclusion /
> off-topic.

This patch appears to reverse out the most recent committed changes in
copy.c.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Matthew Schumacher | 1 Aug 2005 19:08

Re: Performance problems testing with Spamassassin 3.1.0

Tom Lane wrote:
> Michael Parker <parkerm <at> pobox.com> writes:
> 
>>sub bytea_esc {
>>  my ($str) =  <at> _;
>>  my $buf = "";
>>  foreach my $char (split(//,$str)) {
>>    if (ord($char) == 0) { $buf .= "\\\\000"; }
>>    elsif (ord($char) == 39) { $buf .= "\\\\047"; }
>>    elsif (ord($char) == 92) { $buf .= "\\\\134"; }
>>    else { $buf .= $char; }
>>  }
>>  return $buf;
>>}
> 
> 
> Oh, I see the problem: you forgot to convert " to a backslash sequence.
> 
> It would probably also be wise to convert anything >= 128 to a backslash
> sequence, so as to avoid any possible problems with multibyte character
> encodings.  You wouldn't see this issue in a SQL_ASCII database, but I
> suspect it would rise up to bite you with other encoding settings.
> 
> 			regards, tom lane

Here is some code that applies Toms Suggestions:

38c39,41
<     if (ord($char) == 0) { $buf .= "\\\\000"; }
---
(Continue reading)

Alon Goldshuv | 2 Aug 2005 17:03
Favicon

Re: [PATCHES] COPY FROM performance improvements

Tom,

Thanks for pointing it out. I made the small required modifications to match
copy.c version 1.247 and sent it to -patches list. New patch is V16.

Alon.

On 8/1/05 7:51 PM, "Tom Lane" <tgl <at> sss.pgh.pa.us> wrote:

> "Alon Goldshuv" <agoldshuv <at> greenplum.com> writes:
>>> This patch appears to reverse out the most recent committed changes in
>>> copy.c.
> 
>> Which changes do you refer to? I thought I accommodated all the recent
>> changes (I recall some changes to the tupletable/tupleslot interface, HEADER
>> in cvs, and hex escapes and maybe one or 2 more). What did I miss?
> 
> The latest touch of copy.c, namely this patch:
> 
> 2005-07-10 17:13  tgl
> 
> * doc/src/sgml/ref/create_type.sgml, src/backend/commands/copy.c,
> src/backend/commands/typecmds.c, src/backend/tcop/fastpath.c,
> src/backend/tcop/postgres.c, src/backend/utils/adt/arrayfuncs.c,
> src/backend/utils/adt/date.c, src/backend/utils/adt/numeric.c,
> src/backend/utils/adt/rowtypes.c,
> src/backend/utils/adt/timestamp.c, src/backend/utils/adt/varbit.c,
> src/backend/utils/adt/varchar.c, src/backend/utils/adt/varlena.c,
> src/backend/utils/mb/mbutils.c, src/include/catalog/catversion.h,
> src/include/catalog/pg_proc.h,
(Continue reading)

Tobias Brox | 2 Aug 2005 18:04

"nice"/low priority Query

The short question: 

Is there any ways to give postgresql a hint that a
particular SQL call should be run at lower priority?  Since every db
connection has a pid, I can manually run "renice" to scheduele it by the OS
- but of course I can't do it manually all the time.

The long story:

We have a constantly growing database, and hence also a constantly growing
load on the database server.  A hardware upgrade has for different reasons
been postponed, and it's still beeing postponed.

We were hitting the first capacity problems in June, though so far I've
managed to keep the situation in check by tuning the configuration, adding
indices, optimizing queries, doing cacheing in the application, and at one
point in the code I'm even asking the database for "explain plan", grepping
out the estimated cost number, and referring the user to take contact with
the IT-dept if he really needs the report.  But I digress.

Still there are lots of CPU power available - normally the server runs with
50-80% of the CPUs idle, it's just the spikes that kills us.

We basically have two kind of queries that are significant - an ever-ongoing
"critical" rush of simple queries, both reading and writing to the database,
plus some few heavy "non-critical" read-only queries that may cause
significant iowait.  The problem comes when we are so unlucky that two or
three heavy queries are run simultaneously; we get congestion problems -
instead of the applications just running a bit slower, they run _much_
slower.
(Continue reading)

Tom Lane | 2 Aug 2005 18:19
Picon

Re: "nice"/low priority Query

Tobias Brox <tobias <at> nordicbet.com> writes:
> Is there any ways to give postgresql a hint that a
> particular SQL call should be run at lower priority?  Since every db
> connection has a pid, I can manually run "renice" to scheduele it by the OS
> - but of course I can't do it manually all the time.

And it won't help you anyway, because renice only affects CPU priority
not I/O scheduling ... which, by your description, is the real problem.

I think the only thing that's likely to help much is trying to arrange
that the "simple" queries only need to touch pages that are already in
memory.  Some playing around with shared_buffer sizing might help.
Also, if you're not on PG 8.0.*, an update might help.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Richard Huxton | 2 Aug 2005 18:54
Favicon

Re: "nice"/low priority Query

Tom Lane wrote:
> Tobias Brox <tobias <at> nordicbet.com> writes:
> 
>>Is there any ways to give postgresql a hint that a
>>particular SQL call should be run at lower priority?  Since every db
>>connection has a pid, I can manually run "renice" to scheduele it by the OS
>>- but of course I can't do it manually all the time.
> 
> And it won't help you anyway, because renice only affects CPU priority
> not I/O scheduling ... which, by your description, is the real problem.
> 
> I think the only thing that's likely to help much is trying to arrange
> that the "simple" queries only need to touch pages that are already in
> memory.  Some playing around with shared_buffer sizing might help.
> Also, if you're not on PG 8.0.*, an update might help.

Would it be useful to be able to re-use the vacuum_cost_xxx settings in 
8.0 for this sort of thing? I'm thinking a long-running report query 
isn't that different from a vacuum.

--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

(Continue reading)

Jim C. Nasby | 2 Aug 2005 19:25

Re: "nice"/low priority Query

On Tue, Aug 02, 2005 at 12:19:30PM -0400, Tom Lane wrote:
> Tobias Brox <tobias <at> nordicbet.com> writes:
> > Is there any ways to give postgresql a hint that a
> > particular SQL call should be run at lower priority?  Since every db
> > connection has a pid, I can manually run "renice" to scheduele it by the OS
> > - but of course I can't do it manually all the time.
> 
> And it won't help you anyway, because renice only affects CPU priority
> not I/O scheduling ... which, by your description, is the real problem.

Actually, from what I've read 4.2BSD actually took priority into account
when scheduling I/O. I don't know if this behavior is still present in
FreeBSD or the like, though. So depending on the OS, priority could play
a role in determining I/O scheduling.
--

-- 
Jim C. Nasby, Database Consultant               decibel <at> decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Gmane