Jesper Krogh | 1 Jan 12:48 2010

Message queue table - strange performance drop with changing limit size.

Hi.

I have a "message queue" table, that contains in the order of 1-10m
"messages". It is implemented using TheSchwartz:
http://search.cpan.org/~bradfitz/TheSchwartz-1.07/lib/TheSchwartz.pm

So when a "worker" picks the next job it goes into the "job" table an
select the top X highest priority messages with the "funcid" that it can
work on. The table looks like this:
db=# \d workqueue.job
                                  Table "workqueue.job"
    Column     |   Type   |                           Modifiers

---------------+----------+---------------------------------------------------------------
 jobid         | integer  | not null default
nextval('workqueue.job_jobid_seq'::regclass)
 funcid        | integer  | not null
 arg           | bytea    |
 uniqkey       | text     |
 insert_time   | integer  |
 run_after     | integer  | not null
 grabbed_until | integer  | not null
 priority      | smallint |
 coalesce      | text     |
Indexes:
    "job_funcid_key" UNIQUE, btree (funcid, uniqkey)
    "funcid_coalesce_priority" btree (funcid, "coalesce", priority)
    "funcid_prority_idx2" btree (funcid, priority)
    "job_jobid_idx" btree (jobid)

(Continue reading)

Greg Williamson | 1 Jan 15:53 2010
Picon

Re: Message queue table - strange performance drop with changing limit size.

Jesper --

I apologize for top-quoting -- a challenged reader. 

This doesn't directly address your question, but I can't help but notice that the estimates for rows is
_wildly_ off the actual number in each and every query. How often / recently have you run ANALYZE on this
table ?

Are the timing results consistent over several runs ? It is possible that caching effects are entering into
the time results.

Greg Williamson

----- Original Message ----
From: Jesper Krogh <jesper <at> krogh.cc>
To: pgsql-performance <at> postgresql.org
Sent: Fri, January 1, 2010 3:48:43 AM
Subject: [PERFORM] Message queue table - strange performance drop with changing limit size. 

Hi.

I have a "message queue" table, that contains in the order of 1-10m
"messages". It is implemented using TheSchwartz:
http://search.cpan.org/~bradfitz/TheSchwartz-1.07/lib/TheSchwartz.pm

So when a "worker" picks the next job it goes into the "job" table an
select the top X highest priority messages with the "funcid" that it can
work on. The table looks like this:
db=# \d workqueue.job
                                  Table "workqueue.job"
(Continue reading)

Jesper Krogh | 1 Jan 16:04 2010

Re: Message queue table - strange performance drop with changing limit size.

Greg Williamson wrote:
> Jesper --
> 
> I apologize for top-quoting -- a challenged reader.
> 
> This doesn't directly address your question, but I can't help but
> notice that the estimates for rows is _wildly_ off the actual number
> in each and every query. How often / recently have you run ANALYZE on
> this table ?

It is actually rather accurate, what you see in the explain analyze is
the "limit" number getting in.. where the inner "rows" estiemate is for
the where clause+filter.

> Are the timing results consistent over several runs ? It is possible
> that caching effects are entering into the time results.

Yes, they are very consistent. It have subsequently found out that it
depends on the amount of "workers" doing it in parallel. I seem to top
at around 12 processes.

I think I need to rewrite the message-queue stuff in a way that can take
advantage of some stored procedures instead. Currenly it picks out the
"top X" randomize it in the client picks one and tries to "grab" it. ..
and over again if it fails. When the select top X begins to consume
signifcant time it self the process bites itself and gradually gets worse.

The workload for the individual jobs are "small". ~1-2s.

--

-- 
(Continue reading)

Dimitri Fontaine | 2 Jan 13:25 2010

Re: Message queue table - strange performance drop with changing limit size.

Jesper Krogh <jesper <at> krogh.cc> writes:
> I have a "message queue" table, that contains in the order of 1-10m
> "messages". It is implemented using TheSchwartz:
> http://search.cpan.org/~bradfitz/TheSchwartz-1.07/lib/TheSchwartz.pm

One way to approach queueing efficiently with PostgreSQL is to rely on
PGQ. New upcoming 3.0 version (alpha1 has been released) contains the
basics for having cooperative consumers, stable version (2.1.10) only
allows multiple consumers to all do the same work (think replication).

  http://wiki.postgresql.org/wiki/Skytools
  http://wiki.postgresql.org/wiki/PGQ_Tutorial

Regards,
-- 
dim

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Greg Smith | 3 Jan 20:20 2010

Re: Message queue table - strange performance drop with changing limit size.

Jesper Krogh wrote:
> So what I see is that "top 10" takes < 1ms, top 50 takes over 500 times
> more, and top 1000 only 1.5 times more than top 50.
> What can the reason be for the huge drop between limit 10 and limit 50 be?
>   

Normally this means you're hitting much higher performing cached 
behavior with the smaller amount that's degrading significantly once 
you've crossed some threshold.  L1 and L2 CPUs caches vs. regular RAM, 
shared_buffers vs. OS cache changes, and cached in RAM vs. read from 
disk are three transition spots where you can hit a large drop in 
performance just by crossing some boundary, going from "just fits" to 
"doesn't fit and data thrashes around".  Larger data sets do not take a 
linearly larger amount of time to run queries against--they sure can 
degrade order of magnitude faster than that.

> Indexes:
>     "job_funcid_key" UNIQUE, btree (funcid, uniqkey)
>     "funcid_coalesce_priority" btree (funcid, "coalesce", priority)
>     "funcid_prority_idx2" btree (funcid, priority)
>     "job_jobid_idx" btree (jobid)
>   

There may very well be an underlying design issue here though.  Indexes 
are far from free to maintain.  You've got a fair number of them with a 
lot of redundant information, which is adding a significant amount of 
overhead for questionable gains.  If you added those just from the 
theory of "those are the fields combinations I search via", you really 
need to benchmarking that design decision--it's rarely that easy to 
figure out what works best.  For example, if on average there are a 
(Continue reading)

Madison Kelly | 4 Jan 20:10 2010

DB is slow until DB is reloaded

Hi all,

   I've got a fairly small DB (~850MB when pg_dump'ed) running on PgSQL
v8.1.11 a CentOS 5.3 x86_64 Xen-based virtual machine. The filesystem is
ext3 on LVM with 32MB extents. It's about the only real resource-hungry
VM on the server.

   It slows down over time and I can't seem to find a way to get the
performance to return without doing a dump and reload of the database.
I've tried manually running 'VACUUM FULL' and restarting the postgresql
daemon without success.

For example, here is an actual query before the dump and again after the
dump (sorry for the large query):

-=] Before the dump/reload [=-
server <at> iwt=> EXPLAIN ANALYZE SELECT lor_order_type, lor_order_date,
lor_order_time, lor_isp_agent_id, lor_last_modified_date,
lor_isp_order_number, lor_instr_for_bell_rep, lor_type_of_service,
lor_local_voice_provider, lor_dry_loop_instr, lor_termination_location,
lor_req_line_speed, lor_server_from, lor_rate_band,
lor_related_order_nums, lor_related_order_types, lor_activation_date,
lor_cust_first_name, lor_cust_last_name, lor_req_activation_date,
lor_street_number, lor_street_number_suffix, lor_street_name,
lor_street_type, lor_street_direction, lor_location_type_1,
lor_location_number_1, lor_location_type_2, lor_location_number_2,
lor_postal_code, lor_municipality, lor_province, lor_customer_group,
lor_daytime_phone, lor_daytime_phone_ext, lod_value AS circuit_number
FROM line_owner_report LEFT JOIN line_owner_data ON (lor_id=lod_lo_id
AND lod_variable='ISPCircuitNumber1') WHERE lor_lo_id=514;
(Continue reading)

Gary Doades | 4 Jan 20:25 2010
Picon

Re: DB is slow until DB is reloaded

On 04/01/2010 7:10 PM, Madison Kelly wrote:
> Hi all,
>
>   I've got a fairly small DB (~850MB when pg_dump'ed) running on PgSQL
> v8.1.11 a CentOS 5.3 x86_64 Xen-based virtual machine. The filesystem is
> ext3 on LVM with 32MB extents. It's about the only real resource-hungry
> VM on the server.
>
>   It slows down over time and I can't seem to find a way to get the
> performance to return without doing a dump and reload of the database.
> I've tried manually running 'VACUUM FULL' and restarting the postgresql
> daemon without success.
>
> For example, here is an actual query before the dump and again after the
> dump (sorry for the large query):
>
> -=] Before the dump/reload [=-
> server <at> iwt=> EXPLAIN ANALYZE SELECT lor_order_type, lor_order_date,
> lor_order_time, lor_isp_agent_id, lor_last_modified_date,
> lor_isp_order_number, lor_instr_for_bell_rep, lor_type_of_service,
> lor_local_voice_provider, lor_dry_loop_instr, lor_termination_location,
> lor_req_line_speed, lor_server_from, lor_rate_band,
> lor_related_order_nums, lor_related_order_types, lor_activation_date,
> lor_cust_first_name, lor_cust_last_name, lor_req_activation_date,
> lor_street_number, lor_street_number_suffix, lor_street_name,
> lor_street_type, lor_street_direction, lor_location_type_1,
> lor_location_number_1, lor_location_type_2, lor_location_number_2,
> lor_postal_code, lor_municipality, lor_province, lor_customer_group,
> lor_daytime_phone, lor_daytime_phone_ext, lod_value AS circuit_number
> FROM line_owner_report LEFT JOIN line_owner_data ON (lor_id=lod_lo_id
(Continue reading)

Steve Crawford | 4 Jan 20:31 2010

Re: DB is slow until DB is reloaded

Madison Kelly wrote:
> Hi all,
>
>   I've got a fairly small DB...
>
>   It slows down over time and I can't seem to find a way to get the
> performance to return without doing a dump and reload of the database...

Some questions:

Is autovacuum running? This is the most likely suspect. If not, things 
will bloat and you won't be getting appropriate "analyze" runs. Speaking 
of which, what happens if you just run "analyze"?

And as long as you are dumping and reloading anyway, how about version 
upgrading for bug reduction, performance improvement, and cool new features.

Cheers,
Steve

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Brian Cox | 4 Jan 20:41 2010

query looping?

The query shown below [select count(distinct...] seems to be looping 
(99-101% CPU as shown by top for 11+ hours). This using postgres 8.3.5 
on a dual quad core machine (Intel(R) Xeon(R) CPU X5460  <at>  3.16GHz) with 
32G RAM. Can I provide any other info to help investigate this issue? Or 
any thoughts on how to prevent/avoid it?

Thanks,
Brian

top - 11:03:39 up 91 days, 22:39,  2 users,  load average: 3.73, 2.14, 1.42
Tasks: 135 total,   3 running, 132 sleeping,   0 stopped,   0 zombie
Cpu(s): 27.3% us,  7.7% sy,  0.0% ni, 54.0% id, 11.0% wa,  0.0% hi,  0.0% si
Mem:  33264272k total, 33247780k used,    16492k free,    17232k buffers
Swap:  4088532k total,   334264k used,  3754268k free, 26760304k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24585 postgres  17   0  572m 494m 484m R   99  1.5 646:13.63 postmaster

cemdb=# select procpid,query_start,current_query from pg_stat_activity;
  procpid |          query_start          | 
         current_query
---------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    13210 | 2010-01-04 10:54:04.490107-08 | <IDLE>
    24496 | 2010-01-04 10:56:14.982997-08 | <IDLE>
    30636 | 2010-01-04 10:54:04.488569-08 | <IDLE>
     5309 | 2010-01-04 10:56:22.850159-08 | select 
procpid,query_start,current_query from pg_stat_activity;
    30637 | 2010-01-04 10:54:04.490152-08 | <IDLE>
    24500 | 2010-01-04 10:56:14.98354-08  | <IDLE>
    13213 | 2010-01-04 10:54:04.491743-08 | <IDLE>
(Continue reading)

Madison Kelly | 4 Jan 21:30 2010

Re: DB is slow until DB is reloaded

Steve Crawford wrote:
> Madison Kelly wrote:
>> Hi all,
>>
>>   I've got a fairly small DB...
>>
>>   It slows down over time and I can't seem to find a way to get the
>> performance to return without doing a dump and reload of the database...
> 
> Some questions:
> 
> Is autovacuum running? This is the most likely suspect. If not, things 
> will bloat and you won't be getting appropriate "analyze" runs. Speaking 
> of which, what happens if you just run "analyze"?
> 
> And as long as you are dumping and reloading anyway, how about version 
> upgrading for bug reduction, performance improvement, and cool new 
> features.
> 
> Cheers,
> Steve
> 

Yup, I even tried manually running 'VACUUM FULL' and it didn't help. As 
for upgrading;

a) I am trying to find a way around the dump/reload. I am doing it as a 
"last resort" only.
b) I want to keep the version in CentOS' repo.

(Continue reading)


Gmane