Kevin Fallis | 1 Dec 2006 20:10

FW: query optimization on prepared statement through connection vi libpq

Let try this again…

 

From: Kevin Fallis
Sent: Friday, December 01, 2006 1:04 PM
To: 'pgsql-interfaces <at> postgresql.org'; 'pgsql-performance <at> postgresql.org'
Subject: query optimization on prepared statement through connection vi libpq

 

Background:

 

I have a connection pool around the libpq connection framework that I have set up that has a working knowledge of SQL statements that have been prepared.  It tracks if statements have been prepared or not and if they haven’t it will prepare them so I can optimize statements going through that connection.  When the statement is prepared, I am assuming the query planner does all the magic to figure out indexes to use and whatnot.  I am also assuming that at some point in time, if an ANALYZE or VACUUM ANALYZE is performed, all connections in my connection pool would re-optimize the plans so I get the best usage of indexes for the size of the entries in any table I am connected with.

 

Problem:

 

It appears that my connections (there could be many with prepared statements associated with them) are not falling to indexes that should otherwise be hit in a heavy load, high record count activity against a table.  I can see that sequential scans are rampant when in fact these selects should be indexed.  I have VACUUMED, swept and windexed the hell out of the tables and still I am seeing sequential scans.

 

Is there any reason I should consider a threading mechanism that re-prepares statements to reform the query plan info?  And..should not the ANALYZE adjust existing prepared queries for me?

 

Kevin Fallis

kfallis <at> tazznetworks.com

913.488.4705

Kevin Fallis | 1 Dec 2006 19:04

query optimization on prepared statement through connection vi libpq

Background:

 

I have a connection pool around the libpq connection framework that I have set up that has a working knowledge of SQL statements that have been prepared.  It tracks if statements have been prepared or not and if they haven’t it will prepare them so I can optimize statements going through that connection.  When the statement is prepared, I am assuming the query planner does all the magic to figure out indexes to use and whatnot.  I am also assuming that at some point in time, if an ANALYZE or VACUUM ANALYZE is performed, all connections in my connection pool would re-optimize the plans so I get the best usage of indexes for the size of the entries in any table I am connected with.

 

Problem:

 

It appears that my connections (there could be many with prepared statements associated with them) are not falling to indexes that should otherwise be hit in a heavy load, high record count activity against a table.  I can see that sequential scans are rampant when in fact these selects should be indexed.  I have VACUUMED, swept and windexed the hell out of the tables and still I am seeing sequential scans.

 

Is there any reason I should consider a threading mechanism that re-prepares statements to reform the query plan info?  And..should not the ANALYZE adjust existing prepared queries for me?

 

Kevin Fallis

kfallis <at> tazznetworks.com

913.488.4705

Anton | 5 Dec 2006 10:05
Picon

JOIN work somehow strange on simple query

Hi. I have a performance problem with this simple query:

SELECT collect_time FROM n_traffic JOIN n_logins USING (login_id)
WHERE n_logins.account_id = '1655' ORDER BY collect_time LIMIT 1;
----------------------------------------
Limit  (cost=0.00..2026.57 rows=1 width=8) (actual
time=5828.681..5828.681 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..901822.69 rows=445 width=8) (actual
time=5828.660..5828.660 rows=0 loops=1)
         ->  Index Scan using n_traffic_collect_time_login_id on
n_traffic  (cost=0.00..10280.58 rows=281608 width=12) (actual
time=0.026..1080.405 rows=281655 loops=1)
         ->  Index Scan using n_logins_pkey on n_logins
(cost=0.00..3.15 rows=1 width=4) (actual time=0.011..0.011 rows=0
loops=281655)
               Index Cond: ("outer".login_id = n_logins.login_id)
               Filter: (account_id = 1655)
 Total runtime: 5828.963 ms
(7 rows)

This looks very argly... I done some hack and change the
JOIN n_logins USING (login_id) WHERE n_logins.account_id = '1655'
with the
WHERE (login_id = '1' OR login_id = '2'...)
in script which prepare this ...OR...OR... list and form this query.
But THIS IS DIRTY HACK! There must be gooder way... Please, help,
explain!

If there is only one login_id сorresponds to some account_id the query
goes fast:

=# explain analyze SELECT collect_time FROM n_traffic JOIN n_logins
USING (login_id) WHERE n_logins.account_id= '15' ORDER BY collect_time
LIMIT 1;
---------------------------------------------------------
 Limit  (cost=1262.93..1262.94 rows=1 width=8) (actual
time=61.884..61.886 rows=1 loops=1)
   ->  Sort  (cost=1262.93..1263.49 rows=223 width=8) (actual
time=61.867..61.867 rows=1 loops=1)
         Sort Key: n_traffic.collect_time
         ->  Nested Loop  (cost=5.60..1254.23 rows=223 width=8)
(actual time=3.657..36.890 rows=4536 loops=1)
               ->  Index Scan using n_logins_account_id on n_logins
(cost=0.00..3.22 rows=1 width=4) (actual time=0.032..0.049 rows=1
loops=1)
                     Index Cond: (account_id = 15)
               ->  Bitmap Heap Scan on n_traffic  (cost=5.60..1241.72
rows=743 width=12) (actual time=3.601..19.012 rows=4536 loops=1)
                     Recheck Cond: (n_traffic.login_id = "outer".login_id)
                     ->  Bitmap Index Scan on n_traffic_login_id
(cost=0.00..5.60 rows=743 width=0) (actual time=3.129..3.129 rows=4536
loops=1)
                           Index Cond: (n_traffic.login_id = "outer".login_id)
 Total runtime: 63.697 ms
(11 rows)

Tables:

=# \d n_traffic
                         Table "public.n_traffic"
    Column    |            Type             |          Modifiers
--------------+-----------------------------+------------------------------
 login_id     | integer                     | not null
 traftype_id  | integer                     | not null
 collect_time | timestamp without time zone | not null default now()
 bytes_in     | bigint                      | not null default (0)::bigint
 bytes_out    | bigint                      | not null default (0)::bigint
Indexes:
    "n_traffic_login_id_key" UNIQUE, btree (login_id, traftype_id, collect_time)
    "n_traffic_collect_time" btree (collect_time)
    "n_traffic_collect_time_login_id" btree (collect_time, login_id)
    "n_traffic_login_id" btree (login_id)
    "n_traffic_login_id_collect_time" btree (login_id, collect_time)
Foreign-key constraints:
    "n_traffic_login_id_fkey" FOREIGN KEY (login_id) REFERENCES
n_logins(login_id) ON UPDATE CASCADE
    "n_traffic_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES
n_traftypes(traftype_id) ON UPDATE CASCADE

=# \d n_logins
                                      Table "public.n_logins"
   Column   |          Type          |                          Modifiers
------------+------------------------+---------------------------------------------------
login_id   | integer                | not null default
nextval('n_logins_login_id_seq'::regclass)
 account_id | integer                | not null
 login      | character varying(255) | not null
 pwd        | character varying(128) |
Indexes:
    "n_logins_pkey" PRIMARY KEY, btree (login_id)
    "n_logins_login_key" UNIQUE, btree ("login")
    "n_logins_account_id" btree (account_id)
Foreign-key constraints:
    "n_logins_account_id_fkey" FOREIGN KEY (account_id) REFERENCES
n_accounts(account_id)
Triggers:
    tr_after_n_logins AFTER INSERT OR DELETE OR UPDATE ON n_logins FOR
EACH ROW EXECUTE PROCEDURE tr_f_after_n_logins()
    tr_before_n_logins BEFORE UPDATE ON n_logins FOR EACH ROW EXECUTE
PROCEDURE tr_f_before_n_logins()

My machine is 2xPIII 1 GHz + 1 Gb RAM + RAID5 on 6 SCSI disks. My settings is:
max_connections = 50
shared_buffers = 32768
temp_buffers = 57792
work_mem = 81920
maintenance_work_mem = 196608
max_fsm_pages = 262144
max_fsm_relations = 1000
wal_buffers = 64
checkpoint_segments = 4
checkpoint_timeout = 300
checkpoint_warning = 30
effective_cache_size = 6553
random_page_cost = 3
default_statistics_target = 800
log_rotation_age = 1440
log_line_prefix = '%t %u <at> %d '
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
---
# ipcs -T
msginfo:
        msgmax:  16384  (max characters in a message)
        msgmni:     40  (# of message queues)
        msgmnb:   2048  (max characters in a message queue)
        msgtql:     40  (max # of messages in system)
        msgssz:      8  (size of a message segment)
        msgseg:   2048  (# of message segments in system)

shminfo:
        shmmax: 838860800       (max shared memory segment size)
        shmmin:       1 (min shared memory segment size)
        shmmni:     128 (max number of shared memory identifiers)
        shmseg:     128 (max shared memory segments per process)
        shmall:  204800 (max amount of shared memory in pages)

seminfo:
        semmni:    256  (# of semaphore identifiers)
        semmns:   2048  (# of semaphores in system)
        semmnu:    256  (# of undo structures in system)
        semmsl:     60  (max # of semaphores per id)
        semopm:    100  (max # of operations per semop call)
        semume:     10  (max # of undo entries per process)
        semusz:    100  (size in bytes of undo structure)
        semvmx:  32767  (semaphore maximum value)
        semaem:  16384  (adjust on exit max value)
---
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.seminfo.semmnu=256
---
postgresql have its own login class:

postgresql:\
        :openfiles-cur=768:\
        :tc=daemon:
daemon:\
        :ignorenologin:\
        :datasize=infinity:\
        :maxproc=infinity:\
        :openfiles-cur=128:\
        :stacksize-cur=8M:\
        :localcipher=blowfish,8:\
        :tc=default:
--

-- 
engineer

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Anton | 5 Dec 2006 10:38
Picon

Re: JOIN work somehow strange on simple query

> Hi. I have a performance problem with this simple query:
>
> SELECT collect_time FROM n_traffic JOIN n_logins USING (login_id)
> WHERE n_logins.account_id = '1655' ORDER BY collect_time LIMIT 1;

I must add that is occurs when there is no rows in n_traffic for these
login_id's. Where there is at least one (example, login_id='411'
belongs to account_id='1655') query goes fast:

=# INSERT INTO n_traffic VALUES ('411', '1', '2006-09-23 12:23:05', '0', '0');

=# explain analyze SELECT collect_time FROM n_traffic JOIN n_logins
USING (login_id) WHERE n_logins.account_id= '1655' ORDER BY
collect_time LIMIT 1;
------------------------------------
 Limit  (cost=0.00..2025.76 rows=1 width=8) (actual time=0.070..0.072
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..913617.15 rows=451 width=8) (actual
time=0.065..0.065 rows=1 loops=1)
         ->  Index Scan using n_traffic_collect_time_login_id on
n_traffic  (cost=0.00..10418.19 rows=285290 width=12) (actual
time=0.026..0.026 rows=1 loops=1)
         ->  Index Scan using n_logins_pkey on n_logins
(cost=0.00..3.15 rows=1 width=4) (actual time=0.026..0.026 rows=1
loops=1)
               Index Cond: ("outer".login_id = n_logins.login_id)
               Filter: (account_id = 1655)
 Total runtime: 0.322 ms
(7 rows)

--

-- 
engineer

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Joost Kraaijeveld | 6 Dec 2006 08:04
Picon

Locking in PostgreSQL?

Does PostgreSQL lock the entire row in a table if I update only 1
column?

--

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Dave Cramer | 6 Dec 2006 13:29
Favicon

Re: [PERFORM] Locking in PostgreSQL?

Unless you specifically ask for it postgresql doesn't lock any rows  
when you update data.

Dave
On 6-Dec-06, at 2:04 AM, Joost Kraaijeveld wrote:

> Does PostgreSQL lock the entire row in a table if I update only 1
> column?
>
>
> -- 
> Groeten,
>
> Joost Kraaijeveld
> Askesis B.V.
> Molukkenstraat 14
> 6524NB Nijmegen
> tel: 024-3888063 / 06-51855277
> fax: 024-3608416
> web: www.askesis.nl
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/

Jens Schipkowski | 6 Dec 2006 14:20
Picon

Re: [PERFORM] Locking in PostgreSQL?

On Wed, 06 Dec 2006 13:29:37 +0100, Dave Cramer <pg <at> fastcrypt.com> wrote:

> Unless you specifically ask for it postgresql doesn't lock any rows when  
> you update data.
>
Thats not right. UPDATE will force a RowExclusiveLock to rows matching the  
WHERE clause, or all if no one is specified.
 <at> Joost Kraaijeveld: Yes, because there is no EntryExclusiveLock or  
something like that. Roughly you can say, each UPDATE statement iterates  
through the affected table and locks the WHERE clause matching records  
(rows) exclusivly to prevent data inconsistancy during the UPDATE. After  
that your rows will be updated and the lock will be repealed.
You can see this during an long lasting UPDATE by querying the pg_locks  
with joined pg_stats_activity (statement must be enabled).

> Dave
> On 6-Dec-06, at 2:04 AM, Joost Kraaijeveld wrote:
>
>> Does PostgreSQL lock the entire row in a table if I update only 1
>> column?
>>
>>
>> --Groeten,
>>
>> Joost Kraaijeveld
>> Askesis B.V.
>> Molukkenstraat 14
>> 6524NB Nijmegen
>> tel: 024-3888063 / 06-51855277
>> fax: 024-3608416
>> web: www.askesis.nl
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

CU,
Jens

--

-- 
**
Jens Schipkowski

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Dave Cramer | 6 Dec 2006 14:26
Favicon

Re: [PERFORM] Locking in PostgreSQL?


On 6-Dec-06, at 8:20 AM, Jens Schipkowski wrote:

> On Wed, 06 Dec 2006 13:29:37 +0100, Dave Cramer <pg <at> fastcrypt.com>  
> wrote:
>
>> Unless you specifically ask for it postgresql doesn't lock any  
>> rows when you update data.
>>
> Thats not right. UPDATE will force a RowExclusiveLock to rows  
> matching the WHERE clause, or all if no one is specified.
>  <at> Joost Kraaijeveld: Yes, because there is no EntryExclusiveLock or  
> something like that. Roughly you can say, each UPDATE statement  
> iterates through the affected table and locks the WHERE clause  
> matching records (rows) exclusivly to prevent data inconsistancy  
> during the UPDATE. After that your rows will be updated and the  
> lock will be repealed.
> You can see this during an long lasting UPDATE by querying the  
> pg_locks with joined pg_stats_activity (statement must be enabled).

Apparently I've completely misunderstood MVCC then.... My  
understanding is that unless you do a select ... for update then  
update the rows will not be locked .

Dave
>
>> Dave
>> On 6-Dec-06, at 2:04 AM, Joost Kraaijeveld wrote:
>>
>>> Does PostgreSQL lock the entire row in a table if I update only 1
>>> column?
>>>
>>>
>>> --Groeten,
>>>
>>> Joost Kraaijeveld
>>> Askesis B.V.
>>> Molukkenstraat 14
>>> 6524NB Nijmegen
>>> tel: 024-3888063 / 06-51855277
>>> fax: 024-3608416
>>> web: www.askesis.nl
>>>
>>> ---------------------------(end of  
>>> broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>        choose an index scan if your joining column's datatypes do  
>>> not
>>>        match
>>>
>>
>>
>> ---------------------------(end of  
>> broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>
>
>
> CU,
> Jens
>
> -- 
> **
> Jens Schipkowski
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Markus Schiltknecht | 6 Dec 2006 14:42
Picon
Favicon
Gravatar

Re: [PERFORM] Locking in PostgreSQL?

Hi,

Dave Cramer wrote:
> Apparently I've completely misunderstood MVCC then.... 

Probably not. You are both somewhat right.

Jens Schipkowski wrote:
 >> Thats not right. UPDATE will force a RowExclusiveLock to rows
 >> matching the WHERE clause, or all if no one is specified.

That almost right, RowExclusiveLock is a table level lock. An UPDATE 
acquires that, yes. Additionally there are row-level locks, which is 
what you're speaking about. An UPDATE gets an exclusive row-level lock 
on rows it updates.

Please note however, that these row-level locks only block concurrent 
writers, not readers (MVCC lets the readers see the old, unmodified row).

> My understanding 
> is that unless you do a select ... for update then update the rows will 
> not be locked.

Also almost right, depending on what you mean by 'locked'. A plain 
SELECT acquires an ACCESS SHARE lock on the table, but no row-level 
locks. Only a SELECT ... FOR UPDATE does row-level locking (shared ones 
here...)

The very fine documentation covers that in [1].

Regards

Markus

[1]: PostgreSQL Documentation, Explicit Locking:
http://www.postgresql.org/docs/8.2/interactive/explicit-locking.html

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Martijn van Oosterhout | 6 Dec 2006 14:45
Favicon

Re: [PERFORM] Locking in PostgreSQL?

On Wed, Dec 06, 2006 at 08:26:45AM -0500, Dave Cramer wrote:
> >>Unless you specifically ask for it postgresql doesn't lock any  
> >>rows when you update data.
> >>
> >Thats not right. UPDATE will force a RowExclusiveLock to rows  
> >matching the WHERE clause, or all if no one is specified.
> 
> Apparently I've completely misunderstood MVCC then.... My  
> understanding is that unless you do a select ... for update then  
> update the rows will not be locked .

I think it comes down to what you mean by RowExclusiveLock. In MVCC,
writers don't block readers, so even if someone executes an update on a
row, readers (SELECT statements) will not be blocked.

So it's not a lock as such, more a "I've updated this row, go find the
new version if that's appropriate for your snapshot".

Have a nice day,
--

-- 
Martijn van Oosterhout   <kleptog <at> svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Gmane