Bruce McAlister | 16 Mar 2007 20:06
Picon
Favicon

Re: PostgreSQL 8.2.3 VACUUM Timings/Performance

Hi All,

Okay, I'm getting a little further now. I'm about to create entries in the 
pg_autovacuum system tables. However, I'm a little confused as to how I go 
about finding out the OID value of the tables. The pg_autovacuum table 
requires the OID of the table you want to create settings for (vacrelid). 
Can anyone shed some light on how I can extract the OID of the table? Also, 
what happens if you create a table without OID's, are you still able to add 
it's details in the pg_autovacuum table if there is no OID associated with a 
table?

      Name Type References Description
      vacrelid oid pg_class.oid The table this entry is for
      enabled bool   If false, this table is never autovacuumed
      vac_base_thresh integer   Minimum number of modified tuples before 
vacuum
      vac_scale_factor float4   Multiplier for reltuples to add to 
vac_base_thresh
      anl_base_thresh integer   Minimum number of modified tuples before 
analyze
      anl_scale_factor float4   Multiplier for reltuples to add to 
anl_base_thresh
      vac_cost_delay integer   Custom vacuum_cost_delay parameter
      vac_cost_limit integer   Custom vacuum_cost_limit parameter
      freeze_min_age integer   Custom vacuum_freeze_min_age parameter
      freeze_max_age integer   Custom autovacuum_freeze_max_age parameter

Thanks
Bruce

(Continue reading)

Kevin Grittner | 23 Mar 2007 18:01
Favicon

EXISTS optimization

I'm posting this to performance in case our workaround may be of benefit to someone with a similar issue.  I'm
posting to hackers because I hope we can improve our planner in this area so that a workaround is not
necessary.  (It might make sense to reply to one group or the other, depending on reply content.)

We are converting from a commercial database (which shall remain unnamed here, due to license
restrictions on publishing benchmarks).  Most queries run faster on PostgreSQL; a small number choose
very poor plans and run much longer.  This particular query runs on the commercial product in 6.1s first
time, 1.4s cached.  In PostgreSQL it runs in about 144s both first time and cached.  I was able to use an easy
but fairly ugly rewrite (getting duplicate rows and eliminating them with DISTINCT) which runs on the
commercial product in 9.2s/3.0s and in PostgreSQL in 2.0s/0.7s.

Here are the tables:

          Table "public.TranHeader"
    Column     |       Type       | Modifiers
---------------+------------------+-----------
 tranNo        | "TranNoT"        | not null
 countyNo      | "CountyNoT"      | not null
 acctPd        | "DateT"          | not null
 date          | "DateT"          | not null
 isComplete    | boolean          | not null
 tranId        | "TranIdT"        | not null
 tranType      | "TranTypeT"      | not null
 userId        | "UserIdT"        | not null
 workstationId | "WorkstationIdT" | not null
 time          | "TimeT"          |
Indexes:
    "TranHeader_pkey" PRIMARY KEY, btree ("tranNo", "countyNo")
    "TranHeader_TranAcctPeriod" UNIQUE, btree ("acctPd", "tranNo", "countyNo")
    "TranHeader_TranDate" UNIQUE, btree (date, "tranNo", "countyNo")
(Continue reading)

Tom Lane | 23 Mar 2007 22:49
Picon

Re: EXISTS optimization

"Kevin Grittner" <Kevin.Grittner <at> wicourts.gov> writes:
> explain analyze
> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
>   FROM "Adjustment" "A"
>   JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND
"H"."tranNo" = "A"."tranNo")
>   WHERE "H"."tranType" = 'A'
>     AND "A"."date" > DATE '2006-01-01'
>     AND "H"."countyNo" = 66
>     AND "A"."countyNo" = 66
>     AND EXISTS
>         (
>           SELECT 1 FROM "TranDetail" "D"
>             WHERE "D"."tranNo" = "H"."tranNo"
>               AND "D"."countyNo" = "H"."countyNo"
>               AND "D"."caseNo" LIKE '2006TR%'
>         )
> ;

> The commercial product scans the index on caseNo in TranDetail to build a work table of unique values, then
uses indexed access to the TranHeader and then to Adjustment.

If you want that, try rewriting the EXISTS to an IN:

   AND ("H"."tranNo", "H"."countyNo") IN
        (
          SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
            WHERE "D"."caseNo" LIKE '2006TR%'
        )

(Continue reading)

Martijn van Oosterhout | 23 Mar 2007 23:26
Favicon

Re: EXISTS optimization

On Fri, Mar 23, 2007 at 05:49:42PM -0400, Tom Lane wrote:
> We don't currently try to flatten EXISTS into a unique/join plan as we
> do for IN.  I seem to recall not doing so when I rewrote IN planning
> because I didn't think it would be exactly semantically equivalent,
> but that was awhile ago.  Right at the moment it seems like it ought
> to be equivalent as long as the comparison operators are strict.

Wasn't it due to the fact that IN needs to scan through all
possibilites anyway because of its interaction with NULL, whereas
EXISTS can stop at the first row?

That would mean the subquery to be materialised would not be equivalent
if it called any non-immutable functions. It's also much less clear to
be a win in the EXISTs case. But then, that's a costs issue the planner
can deal with...

Have a nice day,
--

-- 
Martijn van Oosterhout   <kleptog <at> svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
Kevin Grittner | 23 Mar 2007 23:26
Favicon

Re: EXISTS optimization


>>> On Fri, Mar 23, 2007 at  4:49 PM, in message <25339.1174686582 <at> sss.pgh.pa.us>,
Tom Lane <tgl <at> sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner <at> wicourts.gov> writes:
>> explain analyze
>> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", 
> "H"."userId", "H"."time"
>>   FROM "Adjustment" "A"
>>   JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND 
> "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
>>   WHERE "H"."tranType" = 'A'
>>     AND "A"."date" > DATE '2006- 01- 01'
>>     AND "H"."countyNo" = 66
>>     AND "A"."countyNo" = 66
>>     AND EXISTS
>>         (
>>           SELECT 1 FROM "TranDetail" "D"
>>             WHERE "D"."tranNo" = "H"."tranNo"
>>               AND "D"."countyNo" = "H"."countyNo"
>>               AND "D"."caseNo" LIKE '2006TR%'
>>         )
>> ;
> 
>> The commercial product scans the index on caseNo in TranDetail to build a 
> work table of unique values, then uses indexed access to the TranHeader and 
> then to Adjustment.
> 
> If you want that, try rewriting the EXISTS to an IN:
> 
>    AND ("H"."tranNo", "H"."countyNo") IN
(Continue reading)

Kevin Grittner | 23 Mar 2007 23:30
Favicon

Re: EXISTS optimization

I don't understand -- TRUE OR UNKNOWN evaluates to TRUE, so why would the IN need to continue?  I'm not quite
following the rest; could you elaborate or give an example?  (Sorry if I'm lagging behind the rest of the
class here.)

-Kevin

 
>>> Martijn van Oosterhout <kleptog <at> svana.org> 03/23/07 5:26 PM >>> 
On Fri, Mar 23, 2007 at 05:49:42PM -0400, Tom Lane wrote:
> We don't currently try to flatten EXISTS into a unique/join plan as we
> do for IN.  I seem to recall not doing so when I rewrote IN planning
> because I didn't think it would be exactly semantically equivalent,
> but that was awhile ago.  Right at the moment it seems like it ought
> to be equivalent as long as the comparison operators are strict.

Wasn't it due to the fact that IN needs to scan through all
possibilites anyway because of its interaction with NULL, whereas
EXISTS can stop at the first row?

That would mean the subquery to be materialised would not be equivalent
if it called any non-immutable functions. It's also much less clear to
be a win in the EXISTs case. But then, that's a costs issue the planner
can deal with...

Have a nice day,
--

-- 
Martijn van Oosterhout   <kleptog <at> svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

---------------------------(end of broadcast)---------------------------
(Continue reading)

Kevin Grittner | 23 Mar 2007 23:37
Favicon

Re: [PERFORM] EXISTS optimization


>>> On Fri, Mar 23, 2007 at  5:26 PM, in message
<46040DAC.EE98.0025.0 <at> wicourts.gov>, "Kevin Grittner"
<Kevin.Grittner <at> wicourts.gov> wrote: 

> I tried something which seems 
> equivalent, but it is running for a very long time.  I'll show it with just 
> the explain while I wait to see how long the explain analyze takes.
>  
> explain
> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", 
> "H"."userId", "H"."time"
>   FROM "Adjustment" "A"
>   JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND 
> "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
>   WHERE "H"."tranType" = 'A'
>     AND "A"."date" > DATE '2006-  01-  01'
>     AND "H"."countyNo" = 66
>     AND "A"."countyNo" = 66
>     AND "H"."tranNo" IN
>         (
>           SELECT "D"."tranNo" FROM "TranDetail" "D"
>             WHERE "D"."caseNo" LIKE '2006TR%'
>               AND "D"."countyNo" = "H"."countyNo"
>         )
> ;

explain analyze results:

 Nested Loop  (cost=0.00..181673.08 rows=1 width=46) (actual time=42224.077..964266.969 rows=2209 loops=1)
(Continue reading)

Tom Lane | 24 Mar 2007 00:04
Picon

Re: EXISTS optimization

"Kevin Grittner" <Kevin.Grittner <at> wicourts.gov> writes:
> Tom Lane <tgl <at> sss.pgh.pa.us> wrote: 
>> If you want that, try rewriting the EXISTS to an IN:
>> 
>> AND ("H"."tranNo", "H"."countyNo") IN
>> (
>> SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
>> WHERE "D"."caseNo" LIKE '2006TR%'
>> )

> That's the good news.  The bad news is that I operate under a
> management portability dictate which doesn't currently allow that
> syntax, since not all of the products they want to cover support it.

Which part of it don't they like --- the multiple IN-comparisons?

> I tried something which seems equivalent, but it is running for a very
> long time.
>     AND "H"."tranNo" IN
>         (
>           SELECT "D"."tranNo" FROM "TranDetail" "D"
>             WHERE "D"."caseNo" LIKE '2006TR%'
>               AND "D"."countyNo" = "H"."countyNo"
>         )

No, that's not gonna accomplish a darn thing, because you've still got
a correlated subquery (ie, a reference to outer "H") and so turning the
IN into a join doesn't work.

			regards, tom lane
(Continue reading)

Kevin Grittner | 24 Mar 2007 03:28
Favicon

Re: [PERFORM] EXISTS optimization

>>> On Fri, Mar 23, 2007 at  6:04 PM, in message
<b6e8f2e80703231604v72b9dc4dr51eebd62274d53ec <at> mail.gmail.com>, "Peter Kovacs"
<peter.kovacs.1.0rc <at> gmail.com> wrote: 
> On 3/23/07, Kevin Grittner <Kevin.Grittner <at> wicourts.gov> wrote:
> [...]
>> That's the good news.  The bad news is that I operate under a management 
> portability dictate which doesn't currently allow that syntax, since not all 
> of the products they want to
> 
> It doesn't really touch the substance, but I am curious: are you not
> even allowed to discriminate between products in your code like:
> if db is 'postresql' then
> ...
> else
> ...
> ?
> 
> What would be the rationale for that?

Anybody who's not curious about that should skip the rest of this email.

Management has simply given a mandate that the software be independent of OS and database vendor, and to use
Java to help with the OS independence.  I have to admit that I am the architect of the database independence
solution that was devised.  (The choice of Java for the OS independence has been very successful.  We have
run our bytecode on HP-UX, Windows, Sun Solaris, and various flavors of Linux without having to compile
different versions of the bytecode.  Other than when people get careless with case sensitivity on file
names or with path separators, it just drops right in and runs.

For the data side, we write all of our queries in ANSI SQL in our own query tool, parse it, and generate Java
classes to run it.  The ANSI source is broken down to "lowest common denominator" queries, with all
(Continue reading)

Martijn van Oosterhout | 24 Mar 2007 14:07
Favicon

Re: EXISTS optimization

On Fri, Mar 23, 2007 at 05:30:27PM -0500, Kevin Grittner wrote:
> I don't understand -- TRUE OR UNKNOWN evaluates to TRUE, so why would
> the IN need to continue?  I'm not quite following the rest; could you
> elaborate or give an example?  (Sorry if I'm lagging behind the rest
> of the class here.)

You're right, I'm getting confused with the interaction of NULL and NOT
IN.

The multiple evaluation thing still applies, but that's minor.

Have a nice day,
--

-- 
Martijn van Oosterhout   <kleptog <at> svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Gmane