Simon Riggs | 1 Jun 2008 11:10
Favicon
Gravatar

Re: 2GB or not 2GB


On Sat, 2008-05-31 at 11:53 -0700, Josh Berkus wrote:
> Simon,
> 
> > There is an optimum for each specific sort.
> 
> Well, if the optimum is something other than "as much as we can get", then we 
> still have a pretty serious issue with work_mem, no?

Depends upon your view of serious I suppose. I would say it is an
acceptable situation, but needs further optimization. I threw some ideas
around on Hackers around Dec/New Year, but I don't have time to work on
this further myself in this dev cycle. Further contributions welcome.

-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Simon Riggs | 2 Jun 2008 19:10
Favicon
Gravatar

Re: Outer joins and equivalence


On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote:
> Simon Riggs <simon <at> 2ndquadrant.com> writes:
> > I have a complex query where making a small change to the SQL increases
> > run-time by > 1000 times.
> 
> > The first SQL statement is of the form
> 
> > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 
> 
> > and the second is like this
> 
> > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)
> 
> > the only difference is the substitution of a -> b
> 
> Please provide an actual test case.

Getting closer, but still not able to produce a moveable test case.

Symptoms are

* using partitioning
* when none of the partitions are excluded
* when equivalence classes ought to be able to reconcile join 

Still working on it

--

-- 
 Simon Riggs           www.2ndQuadrant.com
(Continue reading)

Simon Riggs | 2 Jun 2008 21:47
Favicon
Gravatar

Re: Outer joins and equivalence


On Mon, 2008-06-02 at 18:10 +0100, Simon Riggs wrote:
> On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote:
> > Simon Riggs <simon <at> 2ndquadrant.com> writes:
> > > I have a complex query where making a small change to the SQL increases
> > > run-time by > 1000 times.
> > 
> > > The first SQL statement is of the form
> > 
> > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 
> > 
> > > and the second is like this
> > 
> > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)
> > 
> > > the only difference is the substitution of a -> b
> > 
> > Please provide an actual test case.
> 
> Getting closer, but still not able to produce a moveable test case.

I've got a test case which shows something related and weird, though not
the exact case.

The queries shown here have significantly different costs, depending
upon whether we use tables a or b in the query. Since a and b are
equivalent this result isn't expected at all.

I suspect the plan variation in the original post is somehow cost
related and we are unlikely to discover the exact plan.
(Continue reading)

Justin | 2 Jun 2008 23:43
Favicon

getting estimated cost to agree with actual

As i've been looking over the more complicated queries that i have 
written and gotten allot of help in redoing the quires from you all, 
thanks again.

I have noticed that  estimated Cost to do the query is way off from 
Actual.  The queries  don't run slow at least not to me. 
The Estimated Cost is way higher than the actual time on Hash joins but 
on the scan through the tables the Estimate Cost to  Actual flips where 
Actual is way higher than Estimated Cost

I have tried increasing and decreasing the Stats on the important 
columns with no changes

Changed the stats from 10 to 50, 100 and 150, 200 and 250.

The Estimated Cost always stays the same.  What is the process to track 
down what is going on why the estimate cost is off

----------------Query/View----------------
SELECT (wo.wo_number::text || '-'::text) || wo.wo_subnumber::text AS 
wo_number, wo.wo_qtyord, 'Labor' AS valuetype, item.item_number AS 
parentitem, wooper.wooper_descrip1 AS wooper_des, 
wooperpost.wooperpost_seqnumber AS wooperpost, wooperpost.wooperpost_qty 
AS qty, wooperpost.wooperpost_sutime AS setuptime_matcost, 
wooperpost.wooperpost_sucost AS setupcost_issuecost, 
wooperpost.wooperpost_rntime AS runtime_scrapqty, 
wooperpost.wooperpost_rncost AS runcost_scrapcost, wo.wo_status, 
wooperpost.wooperpost_timestamp::date AS opposteddate, 
wo.wo_completed_date::date AS wocompletedate, 
wo.wo_processstart_date::date AS wostarteddated
(Continue reading)

Scott Marlowe | 3 Jun 2008 01:38
Picon

Re: getting estimated cost to agree with actual

On Mon, Jun 2, 2008 at 3:43 PM, Justin <justin <at> emproshunts.com> wrote:
> As i've been looking over the more complicated queries that i have written
> and gotten allot of help in redoing the quires from you all, thanks again.
>
> I have noticed that  estimated Cost to do the query is way off from Actual.
>  The queries  don't run slow at least not to me. The Estimated Cost is way
> higher than the actual time on Hash joins but on the scan through the tables
> the Estimate Cost to  Actual flips where Actual is way higher than Estimated
> Cost
>
> I have tried increasing and decreasing the Stats on the important columns
> with no changes

Well, they're not measured in the same units.  estimated costs are in
terms of the cost to sequentially scan a single tuple, while actual
costs are in milliseconds.

You might be able to change the cost of sequential scan from 1 to
something else and everything else to reflect that change to get them
close.  But they aren't supposed to match directly up.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Gregory Stark | 3 Jun 2008 03:20
Favicon

Re: getting estimated cost to agree with actual

"Scott Marlowe" <scott.marlowe <at> gmail.com> writes:

> On Mon, Jun 2, 2008 at 3:43 PM, Justin <justin <at> emproshunts.com> wrote:
>>
>> I have noticed that  estimated Cost to do the query is way off from Actual.
>
> Well, they're not measured in the same units.  estimated costs are in
> terms of the cost to sequentially scan a single tuple, while actual
> costs are in milliseconds.

s/tuple/page/

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scott Marlowe | 3 Jun 2008 05:06
Picon

Re: getting estimated cost to agree with actual

On Mon, Jun 2, 2008 at 7:20 PM, Gregory Stark <stark <at> enterprisedb.com> wrote:
> "Scott Marlowe" <scott.marlowe <at> gmail.com> writes:
>
>> On Mon, Jun 2, 2008 at 3:43 PM, Justin <justin <at> emproshunts.com> wrote:
>>>
>>> I have noticed that  estimated Cost to do the query is way off from Actual.
>>
>> Well, they're not measured in the same units.  estimated costs are in
>> terms of the cost to sequentially scan a single tuple, while actual
>> costs are in milliseconds.
>
> s/tuple/page/

Dangit!  I knew that too.  time for some sleep I guess. :)

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Marcin Citowicki | 3 Jun 2008 09:57
Picon
Favicon

query performance question

Hello,

I have a table (transactions) containing 61 414 503 rows. The basic 
count query (select count(transid) from transactions) takes 138226 
milliseconds.
This is the query analysis output:

Aggregate  (cost=2523970.79..2523970.80 rows=1 width=8) (actual 
time=268964.088..268964.090 rows=1 loops=1);
  ->  Seq Scan on transactions  (cost=0.00..2370433.43 rows=61414943 
width=8) (actual time=13.886..151776.860 rows=61414503 loops=1);
Total runtime: 268973.248 ms;

Query has several indexes defined, including one on transid column:

non-unique;index-qualifier;index-name;type;ordinal-position;column-name;asc-or-desc;cardinality;pages;filter-condition

f;<null>;transactions_id_key;3;1;transid;<null>;61414488;168877;<null>;
t;<null>;trans_ip_address_index;3;1;ip_address;<null>;61414488;168598;<null>;
t;<null>;trans_member_id_index;3;1;member_id;<null>;61414488;169058;<null>;
t;<null>;trans_payment_id_index;3;1;payment_id;<null>;61414488;168998;<null>;
t;<null>;trans_status_index;3;1;status;<null>;61414488;169005;<null>;
t;<null>;transactions__time_idx;3;1;time;<null>;61414488;168877;<null>;
t;<null>;transactions_offer_id_idx;3;1;offer_id;<null>;61414488;169017;<null>;

I'm not a dba so I'm not sure if the time it takes to execute this query 
is OK or not, it just  seems a bit long to me.
I'd appreciate it if someone could share his/her thoughts on this. Is 
there a way to make this table/query perform better?
Any query I'm running that joins with transactions table takes forever 
(Continue reading)

hubert depesz lubaczewski | 3 Jun 2008 10:31
Gravatar

Re: query performance question

On Tue, Jun 03, 2008 at 09:57:15AM +0200, Marcin Citowicki wrote:
> I'm not a dba so I'm not sure if the time it takes to execute this query 
> is OK or not, it just  seems a bit long to me.

This is perfectly OK. count(*) from table is generally slow. There are
some ways to make it faster (depending if you need exact count, or some
estimate).

> I'd appreciate it if someone could share his/her thoughts on this. Is 
> there a way to make this table/query perform better?

You can keep the count of elements in this table in separate table, and
update it with triggers.

> Any query I'm running that joins with transactions table takes forever 
> to complete, but maybe this is normal for a table this size.

As for other queries - show them, and their explain analyze.

Performance of count(*) is dependent basically only on size of table. In
case of other queries - it might be simple to optimize them. Or
impossible - without knowing the queries it's impossible to tell.

Do you really care about count(*) from 60m+ record table? How often do
you count the records?

Best regards,

depesz

(Continue reading)

Marcin Citowicki | 3 Jun 2008 10:55
Picon
Favicon

Re: query performance question

Hello Hubert,

Thank you for your reply. I don't really need to count rows in transactions table, I just thought this was a good example to show how slow the query was.
But based on what you wrote it looks like count(*) is slow in general, so this seems to be OK since the table is rather large.
I just ran other queries (joining transactions table) and they returned quickly, which leads me to believe that there could be a problem not with the database, but with the box
the db is running on. Sometimes those same queries take forever and now they complete in no time at all, so perhaps there is a process that is running periodically which is slowing the db down.
I'll need to take a look at this.
Thank you for your help!

Marcin


hubert depesz lubaczewski wrote:
On Tue, Jun 03, 2008 at 09:57:15AM +0200, Marcin Citowicki wrote:
I'm not a dba so I'm not sure if the time it takes to execute this query is OK or not, it just seems a bit long to me.
This is perfectly OK. count(*) from table is generally slow. There are some ways to make it faster (depending if you need exact count, or some estimate).
I'd appreciate it if someone could share his/her thoughts on this. Is there a way to make this table/query perform better?
You can keep the count of elements in this table in separate table, and update it with triggers.
Any query I'm running that joins with transactions table takes forever to complete, but maybe this is normal for a table this size.
As for other queries - show them, and their explain analyze. Performance of count(*) is dependent basically only on size of table. In case of other queries - it might be simple to optimize them. Or impossible - without knowing the queries it's impossible to tell. Do you really care about count(*) from 60m+ record table? How often do you count the records? Best regards, depesz
Attachment (marcin_citowicki.vcf): text/x-vcard, 142 bytes

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Gmane