Tory M Blue | 30 Oct 07:49 2014
Picon

pgtune + configurations with 9.3


Greetings all,

I'm trying to wrap my head around updating my configuration files, which have been probably fairly static since before 8.4.

I've got some beefy hardware but have some tables that are over 57GB raw and end up at 140GB size after indexes are applied. One index creation took 7 hours today. So it's time to dive in and see where i'm lacking and what I should be tweaking.

I looked at pgtune again today and the numbers it's spitting out took me back, they are huge. From all historical conversations and attempts a few of these larger numbers netted reduced performance vs better performance (but that was on older versions of Postgres).

So I come here today to seek out some type of affirmation that these numbers look good and I should look at putting them into my config, staged and or in one fell swoop.

I will start at the same time migrating my config to the latest 9.3 template...

Postgres Version: 9.3.4, Slony 2.1.3 (migrating to 2.2).
CentOS 6.x, 2.6.32-431.5.1.el6.x86_64
Big HP Boxen.

32 core, 256GB of Ram DB is roughly 175GB in size but many tables are hundreds of millions of rows.

The pgtune configurations that were spit out based on the information above;

max_connections = 300
shared_buffers = 64GB
effective_cache_size = 192GB
work_mem = 223696kB
maintenance_work_mem = 2GB
checkpoint_segments = 32
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100

my current configuration:

max_connections = 300
shared_buffers = 2000MB                 
effective_cache_size = 7GB
work_mem = 6GB     
maintenance_work_mem = 10GB    <-- bumped this to try to get my reindexes done
checkpoint_segments = 100  
#wal_buffers = 64kB  
#default_statistics_target = 10 

Here is my complete configuration (This is my slon slave server, so fsync is off and archive is off, but on my primary fsync=on and archive=on).

listen_addresses = '*'
max_connections = 300
shared_buffers = 2000MB
max_prepared_transactions = 0
work_mem = 6GB
maintenance_work_mem = 10GB
fsync = off
checkpoint_segments = 100
checkpoint_timeout = 10min
checkpoint_warning = 3600s
wal_level archive
archive_mode = off
archive_command = 'tar -czvpf /pg_archives/%f.tgz %p'
archive_timeout = 10min
random_page_cost = 2.0
effective_cache_size = 7GB
log_destination = 'stderr'
logging_collector = on
log_directory = '/data/logs'
log_filename = 'pgsql-%m-%d.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_min_messages = info
log_min_duration_statement = 15s
log_line_prefix = '%t %d %u %r %p %m'
log_lock_waits = on
log_timezone = 'US/Pacific'
autovacuum_max_workers = 3
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 2000
datestyle = 'iso, mdy'
timezone = 'US/Pacific'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
deadlock_timeout = 5s

Also while it doesn't matter in 9.3 anymore apparently my sysctl.conf has

kernel.shmmax = 68719476736
kernel.shmall = 4294967296

And PGTune recommended;

kernel.shmmax=137438953472
kernel.shmall=33554432

Also of note in my sysctl.conf config:

vm.zone_reclaim_mode = 0
vm.swappiness = 10 

Thanks for the assistance, watching these index creations crawl along when you know you have so many more compute cycles to provide makes one go crazy.'

Tory
Jeff Chen | 28 Oct 22:15 2014

Sanity checking big select performance

Hi friends!

I'd love to get a sanity check on whether a fat select query I'm doing makes sense given the infrastructure that we have.

We have 3 big tables that we typically join together for certain queries: a ~40 million row photos table, a ~20 million row users table, and a ~50 million row photo_to_album table that maps photos to albums.

We like to display real time analytics, which often results in a query like:

select (random aggregations )
from
photo_to_album join photos on photos.id = photo_to_album.photo_id
join users on users.id = photos.user_id
where
photo_to_album.album_id = <something>
and
photos.created_at between <some dates>
and <other junk>

We have indexes on all of the joins, and the where clauses.

One of these queries that should be targeting something like 300K photos takes 38 seconds to run (with an aggregate/nested loop taking effectively all of that time), and then upon second execution with a warm cache, 4 seconds.

Also worryingly, it spikes read IOPS to almost 1500/sec during the time and write IOPS 200/sec.  When not running the query, steady level read iops basically nil, write hovers around 50-100.

This also increases the queue depth from basically 0 up to 6.  Keeping the queue depth high seems to cause timeouts in other queries.  The CPU is barely if at all affected, hovering around 20%.  Memory also barely affected.

We have a RDS Postgres database, m3.2xlarge with 2000 Provisioned IOPS and 400GB storage.  This translates to 8 virtual CPUs, 30GiB memory, and all SSD drives.

Several questions here:

1) Is that level of IOPS normal?
2) Is it bad that the level of iops can queue requests that screw up the whole database even if it's just select queries?  Especially when the CPU and Memory are still plentiful?
3) What is up with the huge difference between cold and warm cache?

Any help is appreciated!

- jzc
jmcdonagh | 28 Oct 21:55 2014
Picon

Incredibly slow restore times after 9.0>9.2 upgrade

Hi, we have a nightly job that restores current production data to the
development databases in a 'warm spare' database so that if the developers
need fresh data, it's ready during the day. When we moved from 9.0 to 9.2
suddenly the restores began to take from a few hours to more like 15 hours
or so. We're in Amazon EC2, I've tried new EBS volumes, warmed them up,
threw IOPS at them, pretty much all the standard stuff to get more disk
performance.

Here's the thing, the disk isn't saturated. The behavior I'm seeing seems
very odd to me; I'm seeing the source disk which holds the dump saturated by
reads, which is great, but then I just see nothing being written to the
postgres volume. Just nothing happening, then a small burst. There is no
write queue backup on the destination disk either. if I look at
pg_stat_activity I'll see something like:

COPY salesforce_reconciliation (salesforce_id, email, advisor_salesforce_id,
processed) FROM stdin

and even for small tables, that seems to take a very long time even though
the destination disk is almost at 0 utilization.

The dumps are created with pg_dump -Fc and restored with pg_restore -d db -j
2 -O -U postgres PostgreSQL-db.sql.

Is it possible that some default settings were changed from 9.0 to 9.2 that
would cause this kind of behavior? I'm stumped here. Thanks in advance for
any consideration here.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Incredibly-slow-restore-times-after-9-0-9-2-upgrade-tp5824701.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Huang, Suya | 28 Oct 07:26 2014

unnecessary sort in the execution plan when doing group by

Hi,

 

This is the Greenplum database 4.3.1.0.

 

Tables :

 

dev=# \d+ visits_weekly_new_3

Append-Only Columnar Table "uk.visits_weekly_new_3"

Column | Type | Modifiers | Storage | Compression Type | Compression Level | Block Size | Description

------------------+------------------------+-----------+----------+------------------+-------------------+------------+-------------

date | date | | plain | none | 0 | 32768 |

hw_id | character varying(256) | | extended | none | 0 | 32768 |

channel | character varying(256) | | extended | none | 0 | 32768 |

industries | integer[] | | extended | none | 0 | 32768 |

weighted_visits | double precision | | plain | none | 0 | 32768 |

projected_visits | double precision | | plain | none | 0 | 32768 |

Checksum: f

Child tables: visits_weekly_new_3_1_prt_1,

visits_weekly_new_3_1_prt_2,

visits_weekly_new_3_1_prt_3,

visits_weekly_new_3_1_prt_4,

visits_weekly_new_3_1_prt_5,

visits_weekly_new_3_1_prt_6,

visits_weekly_new_3_1_prt_7,

visits_weekly_new_3_1_prt_8,

visits_weekly_new_3_1_prt_9

Has OIDs: no

Options: appendonly=true, orientation=column

Distributed by: (date, channel)

 

dev=# \d+ temp.tmp_hw_channel

Table "temp.tmp_hw_channel"

Column | Type | Modifiers | Storage | Description

--------+------------------------+-----------+----------+-------------

id | character varying(256) | | extended |

Has OIDs: no

Distributed by: (id)

 

Below is the execution plan for two SQL, the only difference between two SQL is that one has 2 group by columns and the other one has 3 group by columns. However, one is use hash aggregate, the other is doing sorting and group aggregate. It leads to very different performance although it has the same result set.

 

 

dev=# explain ANALYZE                                                                                                                                                                                 

SELECT v.date,

       channel,

       SUM(weighted_visits) AS weighted_visits,

       SUM(projected_visits) AS projected_visits

FROM visits_weekly_new_3 v

INNER JOIN temp.tmp_hw_channel id ON v.hw_id = id.id

WHERE v.date >= '2014-05-03'

  AND v.date<= '2014-05-24'

GROUP BY v.date,

         channel;

 

                                                                                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 24:1  (slice2; segments: 24)  (cost=31286842.08..31287447.81 rows=1683 width=536)

   Rows out:  15380160 rows at destination with 14860 ms to first row, 23856 ms to end, start offset by 104 ms.

   ->  HashAggregate  (cost=31286842.08..31287447.81 rows=1683 width=536)

         Group By: v.date, v.channel

         Rows out:  Avg 640840.0 rows x 24 workers.  Max 642307 rows (seg14) with 18979 ms to first row, 19365 ms to end, start offset by 57 ms.

         Executor memory:  66688K bytes avg, 66794K bytes max (seg0).

         ->  Hash Join  (cost=299802.88..28414086.88 rows=11969814 width=132)

               Hash Cond: v.hw_id::text = id.id::text

               Rows out:  Avg 6657725.2 rows x 24 workers.  Max 7363985 rows (seg10) with 1225 ms to first row, 18839 ms to end, start offset by 63 ms.

               Executor memory:  35037K bytes avg, 35037K bytes max (seg0).

               Work_mem used:  35037K bytes avg, 35037K bytes max (seg0). Workfile: (0 spilling, 0 reused)

               (seg10)  Hash chain length 1.3 avg, 7 max, using 389733 of 1048589 buckets.

               ->  Append  (cost=0.00..5297308.80 rows=11969814 width=87)

                     Rows out:  Avg 11969813.7 rows x 24 workers.  Max 13482240 rows (seg10) with 1.284 ms to first row, 8168 ms to end, start offset by 1287 ms.

                     ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_1 v  (cost=0.00..1324327.20 rows=2992454 width=87)

                           Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                           Rows out:  Avg 2992453.4 rows x 24 workers.  Max 3623583 rows (seg21) with 1.232 ms to first row, 1299 ms to end, start offset by 1279 ms.

                     ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_2 v  (cost=0.00..1324327.20 rows=2992454 width=87)

                           Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                           Rows out:  Avg 2992453.4 rows x 24 workers.  Max 3767678 rows (seg10) with 0.312 ms to first row, 2123 ms to end, start offset by 5966 ms.

                     ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_3 v  (cost=0.00..1324328.20 rows=2992454 width=87)

                           Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                           Rows out:  Avg 2992453.4 rows x 24 workers.  Max 4283207 rows (seg15) with 0.295 ms to first row, 1444 ms to end, start offset by 9383 ms.

                     ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_4 v  (cost=0.00..1324326.20 rows=2992454 width=87)

                           Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                           Rows out:  Avg 2992453.4 rows x 24 workers.  Max 3760361 rows (seg12) with 0.299 ms to first row, 1309 ms to end, start offset by 14026 ms.

               ->  Hash  (cost=127888.98..127888.98 rows=487373 width=45)

                     Rows in:  Avg 487556.0 rows x 24 workers.  Max 487556 rows (seg0) with 1188 ms to end, start offset by 86 ms.

                     ->  Broadcast Motion 24:24  (slice1; segments: 24)  (cost=0.00..127888.98 rows=487373 width=45)

                           Rows out:  Avg 487556.0 rows x 24 workers at destination.  Max 487556 rows (seg0) with 0.094 ms to first row, 590 ms to end, start offset by 86 ms.

                           ->  Seq Scan on tmp_hw_channel id  (cost=0.00..6045.73 rows=20308 width=45)

                                 Rows out:  Avg 20314.8 rows x 24 workers.  Max 20536 rows (seg23) with 0.131 ms to first row, 6.642 ms to end, start offset by 69 ms.

Slice statistics:

   (slice0)    Executor memory: 286K bytes.

   (slice1)    Executor memory: 774K bytes avg x 24 workers, 774K bytes max (seg0).

   (slice2)    Executor memory: 149541K bytes avg x 24 workers, 149658K bytes max (seg0).  Work_mem: 35037K bytes max.

Statement statistics:

   Memory used: 1048576K bytes

Settings:  enable_bitmapscan=on; enable_indexscan=on; enable_sort=off

Total runtime: 25374.000 ms

(40 rows)

 

Time: 25383.704 ms

 

 

dev=# explain ANALYZE                                       

SELECT v.date,

       channel,

       industries,

       SUM(weighted_visits) AS weighted_visits,

       SUM(projected_visits) AS projected_visits

FROM visits_weekly_new_3 v

INNER JOIN temp.tmp_hw_channel id ON v.hw_id = id.id

WHERE v.date >= '2014-05-03'

  AND v.date<= '2014-05-24'

GROUP BY v.date,

         channel,

         industries;

 

                                                                                                                                                                                                                              QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 24:1  (slice2; segments: 24)  (cost=152269717.33..157009763.41 rows=1196982 width=568)

   Rows out:  15380160 rows at destination with 35320 ms to first row, 70091 ms to end, start offset by 102 ms.

   ->  GroupAggregate  (cost=152269717.33..157009763.41 rows=1196982 width=568)

         Group By: v.date, v.channel, v.industries

         Rows out:  Avg 640840.0 rows x 24 workers.  Max 642307 rows (seg14) with 48843 ms to first row, 54853 ms to end, start offset by 54 ms.

         ->  Sort  (cost=152269717.33..152987906.13 rows=11969814 width=155)

               Sort Key: v.date, v.channel, v.industries

               Rows out:  Avg 6657725.2 rows x 24 workers.  Max 7363985 rows (seg10) with 64604 ms to first row, 65912 ms to end, start offset by 62 ms.

               Executor memory:  692755K bytes avg, 760338K bytes max (seg15).

               Work_mem used:  692755K bytes avg, 760338K bytes max (seg15). Workfile: (24 spilling, 0 reused)

               Work_mem wanted: 1603070K bytes avg, 1782291K bytes max (seg10) to lessen workfile I/O affecting 24 workers.

               ->  Hash Join  (cost=299802.88..28834900.88 rows=11969814 width=155)

                     Hash Cond: v.hw_id::text = id.id::text

                     Rows out:  Avg 6657725.2 rows x 24 workers.  Max 7363985 rows (seg10) with 1226 ms to first row, 24249 ms to end, start offset by 62 ms.

                     Executor memory:  35037K bytes avg, 35037K bytes max (seg0).

                     Work_mem used:  35037K bytes avg, 35037K bytes max (seg0). Workfile: (0 spilling, 0 reused)

                     (seg10)  Hash chain length 1.3 avg, 7 max, using 389733 of 1048589 buckets.

                     (seg15)  Hash chain length 1.3 avg, 7 max, using 389733 of 1048589 buckets.

                     ->  Append  (cost=0.00..5297308.80 rows=11969814 width=111)

                           Rows out:  Avg 11969813.7 rows x 24 workers.  Max 13482240 rows (seg10) with 0.846 ms to first row, 11214 ms to end, start offset by 1287 ms.

                           ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_1 v  (cost=0.00..1324327.20 rows=2992454 width=111)

                                 Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                                 Rows out:  Avg 2992453.4 rows x 24 workers.  Max 3623583 rows (seg21) with 0.624 ms to first row, 1465 ms to end, start offset by 1264 ms.

                           ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_2 v  (cost=0.00..1324327.20 rows=2992454 width=110)

                                 Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                                 Rows out:  Avg 2992453.4 rows x 24 workers.  Max 3767678 rows (seg10) with 0.486 ms to first row, 2419 ms to end, start offset by 8616 ms.

                           ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_3 v  (cost=0.00..1324328.20 rows=2992454 width=111)

                                 Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                                 Rows out:  Avg 2992453.4 rows x 24 workers.  Max 4283207 rows (seg15) with 0.453 ms to first row, 2357 ms to end, start offset by 13242 ms.

                           ->  Append-only Columnar Scan on visits_weekly_new_3_1_prt_4 v  (cost=0.00..1324326.20 rows=2992454 width=110)

                                 Filter: date >= '2014-05-03'::date AND date <= '2014-05-24'::date

                                 Rows out:  Avg 2992453.4 rows x 24 workers.  Max 3760361 rows (seg12) with 0.440 ms to first row, 2532 ms to end, start offset by 35558 ms.

                     ->  Hash  (cost=127888.98..127888.98 rows=487373 width=45)

                           Rows in:  Avg 487556.0 rows x 24 workers.  Max 487556 rows (seg0) with 1184 ms to end, start offset by 74 ms.

                           ->  Broadcast Motion 24:24  (slice1; segments: 24)  (cost=0.00..127888.98 rows=487373 width=45)

                                 Rows out:  Avg 487556.0 rows x 24 workers at destination.  Max 487556 rows (seg0) with 0.168 ms to first row, 622 ms to end, start offset by 74 ms.

                                 ->  Seq Scan on tmp_hw_channel id  (cost=0.00..6045.73 rows=20308 width=45)

                                       Rows out:  Avg 20314.8 rows x 24 workers.  Max 20536 rows (seg23) with 0.263 ms to first row, 6.508 ms to end, start offset by 70 ms.

Slice statistics:

   (slice0)    Executor memory: 286K bytes.

   (slice1)    Executor memory: 774K bytes avg x 24 workers, 774K bytes max (seg0).

   (slice2)  * Executor memory: 771617K bytes avg x 24 workers, 843298K bytes max (seg15).  Work_mem: 760338K bytes max, 1782291K bytes wanted.

Statement statistics:

   Memory used: 1048576K bytes

   Memory wanted: 3565580K bytes

Settings:  enable_bitmapscan=on; enable_indexscan=on; enable_sort=off

Total runtime: 72071.845 ms

(47 rows)

 

Time: 72078.079 ms

pinker | 23 Oct 15:24 2014
Picon

Checkpoints tuning

I have saved data from pg_stat_bgwriter view following Greg Smith's advice
from his book:
select now(),* from pg_stat_bgwriter; 

and then aggregated the data with query from his book as well.

checkpoint segments was first 30 and next day I have increased it to 200,
and results has changed:

<http://postgresql.1045698.n5.nabble.com/file/n5824026/Auswahl_235.png> 

now percent of checkpoints required because of number of segments is bigger
and backend writer share is also too high- I assume it's not what should
happen.
I'm not sure how to interpret correlation between allocation and written
data?
The bigger amount of data written per sec is a good sign?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Checkpoints-tuning-tp5824026.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Montana Low | 22 Oct 00:25 2014
Picon

ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel 3.16.3. I receive numerous Error: out of memory messages in the log, which are aborting client requests, even though there appears to be 23GB available in the OS cache.

There is no swap on the box. Postgres is behind pgbouncer to protect from the 200 real clients, which limits connections to 32, although there are rarely more than 20 active connections, even though postgres max_connections is set very high for historic reasons. There is also a 4GB java process running on the box.




relevant postgresql.conf:

max_connections = 1000                  # (change requires restart)
shared_buffers = 7GB                    # min 128kB
work_mem = 40MB                         # min 64kB
maintenance_work_mem = 1GB              # min 1MB
effective_cache_size = 20GB



sysctl.conf:

vm.swappiness = 0
vm.overcommit_memory = 2
kernel.shmmax=34359738368
kernel.shmall=8388608



log example:

ERROR:  out of memory
DETAIL:  Failed on request of size 67108864.
STATEMENT:  SELECT  "package_texts".* FROM "package_texts"  WHERE "package_texts"."id" = $1 LIMIT 1



example pg_top, showing 23GB available in cache:

last pid:  6607;  load avg:  3.59,  2.32,  2.61;       up 16+09:17:29 20:49:51
18 processes: 1 running, 17 sleeping
CPU states: 22.5% user,  0.0% nice,  4.9% system, 63.2% idle,  9.4% iowait
Memory: 29G used, 186M free, 7648K buffers, 23G cached
DB activity: 2479 tps,  1 rollbs/s, 217 buffer r/s, 99 hit%,  11994 row r/s, 3820 row w/s  
DB I/O:     0 reads/s,     0 KB/s,     0 writes/s,     0 KB/s  
DB disk: 149.8 GB total, 46.7 GB free (68% used)
Swap:



example top showing the only other significant 4GB process on the box:

top - 21:05:09 up 16 days,  9:32,  2 users,  load average: 2.73, 2.91, 2.88
Tasks: 147 total,   3 running, 244 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.1 us,  4.1 sy,  0.0 ni, 62.9 id,  9.8 wa,  0.0 hi,  0.7 si,  0.3 st
KiB Mem:  30827220 total, 30642584 used,   184636 free,     7292 buffers
KiB Swap:        0 total,        0 used,        0 free. 23449636 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                   7407 postgres  20   0 7604928  10172   7932 S  29.6  0.0   2:51.27 postgres
10469 postgres  20   0 7617716 176032 160328 R  11.6  0.6   0:01.48 postgres
10211 postgres  20   0 7630352 237736 208704 S  10.6  0.8   0:03.64 postgres
18202 elastic+  20   0 8726984 4.223g   4248 S   9.6 14.4 883:06.79 java
9711 postgres  20   0 7619500 354188 335856 S   7.0  1.1   0:08.03 postgres
3638 postgres  20   0 7634552 1.162g 1.127g S   6.6  4.0   0:50.42 postgres
 
Björn Wittich | 21 Oct 19:26 2014
Picon
Picon

extremly bad select performance on huge table

Hi newsgroup,

I have a very huge table (70 mio rows ) with a key (text length about 30 
characters each key). A select on this indexed column "myprimkey" (index 
on column mycolumn)  took more than 30 mins.

Here is the explain (analyze,buffers) select mycolumn from myhugetable

"Index Only Scan using myprimkey on myhugetable  (cost=0.00..8224444.82 
rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999 
loops=1)"

"  Heap Fetches: 356861"

"Total runtime: 2503009.611 ms"

Even repeating the query does not show a performance improvement. I 
assume that the index itself is too large for my db cache. What can I do 
to gain performance? Which parameters can I adapt? Having a huge Linux 
machine with 72 GB RAM.

Note: This select is just for testing. My final statement will be a join 
on this table via the "mycolumn" column.

Thanks for your help
Björn

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

john | 21 Oct 14:57 2014

Query Performance Problem



Hi all,

I'm experimenting with table partitioning though inheritance. I'm testing a query as follows:

explain (analyze, buffers)
select response.id
from claim.response
where response.account_id = 4766
and response.expire_timestamp is null
and response.create_timestamp >= DATE '2014-08-01'
order by create_timestamp;

The response table looks like this:
"account_id";"integer"
"file_type_id";"integer"
"receiver_inbound_detail_id";"integer"
"processing_status_id";"integer"
"processing";"boolean"
"expire_timestamp";"timestamp without time zone"
"last_mod_timestamp";"timestamp without time zone"
"create_timestamp";"timestamp without time zone"
"response_trace_nbr";"character varying"
"posted_timestamp";"timestamp without time zone"
"need_to_post";"boolean"
"response_message";"text"
"worked";"boolean"
"response_status_id";"integer"
"response_type_id";"integer"
"outbound_claim_detail_id";"bigint"
"id";"bigint"

Here are some rowcounts:

SELECT count(*) from claim_response.response_201408;
  count
---------
 4585746
(1 row)

Time: 7271.054 ms
SELECT count(*) from claim_response.response_201409;
  count
---------
 3523370
(1 row)

Time: 4341.116 ms
SELECT count(*) from claim_response.response_201410;
 count
-------
   154
(1 row)

Time: 0.258 ms

The entire table has 225,665,512 rows. I read that a partitioning rule of thumb is that benefits of partitioning occur starting around 100 million rows.

SELECT count(*) from claim.response;
   count
-----------
 225665512
(1 row)

Time: 685064.637 ms


The partitioning is on the create_timestamp field.

The server is Red Hat Enterprise Linux Server release 6.2 (Santiago) on a VM machine - 8 GB RAM with 2 CPUs:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
CPU socket(s):         1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               2660.000
BogoMIPS:              5320.00
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,1



2 users,  load average: 0.00, 0.12, 0.37


Please see the following for the explain analysis :


I'm trying to understand why I'm getting the yellow, orange, and red on the inclusive, and the yellow on the exclusive. (referring to the explain.depesz.com/s/I3SL page.)
I'm relatively new to PostgreSQL, but I've been an Oracle DBA for some time. I suspect the I/O may be dragging but I don't know how to dig that information out from here. Please point out anything else you can decipher from this. 

Thanks,

John
Marco Di Cesare | 20 Oct 22:32 2014

Query with large number of joins

We are using Postgres for the first time after being SQLServer users for a long time so forgive for being noobs.

 

We are using a BI tool that generates a query with an unusually large number of joins. My understanding is that with this many joins Postgres query planner can't possibly use an exhaustive search so it drops into a heuristics algorithm. Unfortunately, the query runs quite slow (~35 seconds) and seems to ignore using primary keys and indexes where available. 

 

Query plan here (sorry had to anonymize):

http://explain.depesz.com/s/Uml

 

Line 30 is one of the pain points where a full table scan is running on 4.2 million rows even though there are indexes on oscar_bravo.foxtrot_four and oscar_charlie.foxtrot_four

 

We've tried to play around with the join_collapse_limit value by upping it from the default of 8 to 10 or 12 but it doesn't seem to help much. Cranking the value up to an unreasonable value of 20 does shave some seconds off the query time but not substantially (explain plan with the value set to 20: http://explain.depesz.com/s/sW6).

 

We haven't tried playing around with the geqo_threshold at this point.

 

Any thoughts on ways to speed up the run time of this query or any other Postgres settings we should be aware of when dealing with this unusually large number of joins?

 

Thanks in advance

 

 

 

Marco Di Cesare

 

Laurent Martelli | 19 Oct 06:10 2014

IS NOT NULL and LEFT JOIN

Hello there,

I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.

I grant you that the query can be written without the JOIN on user_user_info,
but it is generated like this by hibernate. Just changing the IS NOT NULL condition
to the other side of useless JOIN makes a big difference in the query plan :

-- THE BAD ONE : given the selectivity on c.name and c.email, barely more than one row will ever be returned
explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze <at> ezrfz.com' or u.id is not null;
                                                           QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.83..2246.76 rows=59412 width=4012) (actual time=53.645..53.645 rows=0 loops=1)
   Hash Cond: (c.user_info = u.id)
   Filter: (((lower((c.name)::text) = 'martelli'::text) AND ((c.email)::text = 'dsfze <at> ezrfz.com'::text)) OR (u.id IS NOT NULL))
   Rows Removed by Filter: 58247
   ->  Seq Scan on contact_contact c  (cost=0.00..2022.12 rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
   ->  Hash  (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029 rows=37 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 2kB
         ->  Seq Scan on user_user_info u  (cost=0.00..1.37 rows=37 width=8) (actual time=0.004..0.015 rows=37 loops=1)
 Planning time: 0.790 ms
 Execution time: 53.712 ms

-- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead of userinfo1_.id)
explain analyze select c.*         
       from contact_contact c        
       left outer join user_user_info u on c.user_info=u.id          
       left outer join contact_address a on c.address=a.id         
      where lower(c.name)='martelli'
        and c.email='dsfze <at> ezrfz.com' or c.user_info is not null;
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on contact_contact c  (cost=8.60..16.41 rows=1 width=4012) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (((email)::text = 'dsfze <at> ezrfz.com'::text) OR (user_info IS NOT NULL))
   Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text = 'dsfze <at> ezrfz.com'::text)) OR (user_info IS NOT NULL))
   ->  BitmapOr  (cost=8.60..8.60 rows=2 width=0) (actual time=0.034..0.034 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_contact_email  (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
               Index Cond: ((email)::text = 'dsfze <at> ezrfz.com'::text)
         ->  Bitmap Index Scan on contact_contact_user_info_idx  (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
               Index Cond: (user_info IS NOT NULL)
 Planning time: 0.602 ms
 Execution time: 0.118 ms
My tables are as follow, and I use postgres 9.4 : Table « public.contact_contact » Colonne | Type | Modificateurs | Stockage | Cible de statistiques | Description ------------------------+-----------------------------+---------------+----------+-----------------------+------------- id | bigint | non NULL | plain | | archived | boolean | | plain | | version | integer | | plain | | created_on | timestamp without time zone | | plain | | updated_on | timestamp without time zone | | plain | | actor_ref | character varying(255) | | extended | | addressl1 | character varying(255) | | extended | | comment | text | | extended | | contact_partner_ok | boolean | | plain | | date_of_birth | date | | plain | | email | character varying(255) | | extended | | email_pro | character varying(255) | | extended | | fax | character varying(255) | | extended | | first_name | character varying(255) | | extended | | fixed_phone1 | character varying(255) | | extended | | fixed_phone2 | character varying(255) | | extended | | fixed_phone_pro | character varying(255) | | extended | | import_key1 | character varying(255) | | extended | | import_key2 | character varying(255) | | extended | | koala_id | character varying(255) | | extended | | mobile_phone_perso | character varying(255) | | extended | | mobile_phone_pro | character varying(255) | | extended | | name | character varying(255) | non NULL | extended | | ola_email | character varying(255) | | extended | | ola_phone | character varying(255) | | extended | | person_category_select | character varying(255) | | extended | | web_site | character varying(255) | | extended | | year_of_birth | integer | | plain | | created_by | bigint | | plain | | updated_by | bigint | | plain | | action_event_source | bigint | | plain | | address | bigint | | plain | | address_pro | bigint | | plain | | jobtitle | bigint | | plain | | merged_with | bigint | | plain | | nationality_country | bigint | | plain | | origin | bigint | | plain | | place_of_birth_address | bigint | | plain | | title | bigint | | plain | | user_info | bigint | | plain | | import_origin | character varying(255) | | extended | | duplicates | bigint | | plain | | Index : "contact_contact_pkey" PRIMARY KEY, btree (id) "uk_bx19539x7h0y0w4p4uw9gnqbo" UNIQUE CONSTRAINT, btree (koala_id) "uk_vg25de8jcu18m89o9dy2n4fe" UNIQUE CONSTRAINT, btree (import_key1) "contact_contact_action_event_source_idx" btree (action_event_source) "contact_contact_address_idx" btree (address) "contact_contact_address_l1_idx" btree (addressl1) "contact_contact_address_pro_idx" btree (address_pro) "contact_contact_jobtitle_idx" btree (jobtitle) "contact_contact_merged_with_idx" btree (merged_with) "contact_contact_name_idx" btree (name) "contact_contact_nationality_country_idx" btree (nationality_country) "contact_contact_origin_idx" btree (origin) "contact_contact_place_of_birth_address_idx" btree (place_of_birth_address) "contact_contact_title_idx" btree (title) "contact_contact_user_info_idx" btree (user_info) "idx_contact_email" btree (email) "idx_contact_lower_name" btree (lower(name::text)) "idx_contact_search_name" btree (lower(name::text), lower(first_name::text)) Contraintes de clés étrangères : "fk_8dj7rw3jrdxk4vxbi6vony0ne" FOREIGN KEY (created_by) REFERENCES auth_user(id) "fk_9s1dhwrvw6lq74fvty6oj2wc5" FOREIGN KEY (address_pro) REFERENCES contact_address(id) "fk_9wjsgh8lt5ixbshx9pjwmjtk1" FOREIGN KEY (origin) REFERENCES crm_origin(id) "fk_ad53x8tdando1w1jdlyxcop9v" FOREIGN KEY (duplicates) REFERENCES contact_contact(id) "fk_edusucr1gdfj99vtm0a70gggg" FOREIGN KEY (title) REFERENCES contact_title(id) "fk_g7u75rjd754m7evn2alckjvka" FOREIGN KEY (merged_with) REFERENCES contact_contact(id) "fk_j72hkuq0337v6utjbf85hhvxg" FOREIGN KEY (action_event_source) REFERENCES crm_action_event_source(id) "fk_k73mcu7swia6uf6qpp4v6lwxf" FOREIGN KEY (updated_by) REFERENCES auth_user(id) "fk_mvpl7wudcdqgitmmsd900od97" FOREIGN KEY (place_of_birth_address) REFERENCES contact_address(id) "fk_onriw4jpgeuvhfk827amxry8k" FOREIGN KEY (address) REFERENCES contact_address(id) "fk_rpkvno8705gap9ejj4wnnb7hl" FOREIGN KEY (nationality_country) REFERENCES territory_country(id) "fk_s9fsy33u5a9ke8wee9mc2vpsx" FOREIGN KEY (user_info) REFERENCES user_user_info(id) "fk_t8uexb8lmgaftjsnn63eoty90" FOREIGN KEY (jobtitle) REFERENCES contact_jobtitle(id) coopener=# \d+ user_user_info Table « public.user_user_info » Colonne | Type | Modificateurs | Stockage | Cible de statistiques | Description -----------------+-----------------------------+---------------+----------+-----------------------+------------- id | bigint | non NULL | plain | | archived | boolean | | plain | | version | integer | | plain | | created_on | timestamp without time zone | | plain | | updated_on | timestamp without time zone | | plain | | full_name | character varying(255) | | extended | | import_key | character varying(255) | | extended | | import_username | character varying(255) | | extended | | today | timestamp without time zone | | plain | | user_system_ok | boolean | | plain | | created_by | bigint | | plain | | updated_by | bigint | | plain | | active_company | bigint | | plain | | agency | bigint | | plain | | internal_user | bigint | non NULL | plain | | Index : "user_user_info_pkey" PRIMARY KEY, btree (id) "uk_99o17944ddytysui6b05lxyb2" UNIQUE CONSTRAINT, btree (import_key) "uk_cqgrw75h35ts19uixn03rkjsu" UNIQUE CONSTRAINT, btree (internal_user) "uk_jtsvu4r7s12nnh9o2sloqyqv4" UNIQUE CONSTRAINT, btree (import_username) "user_user_info_active_company_idx" btree (active_company) "user_user_info_agency_idx" btree (agency) "user_user_info_full_name_idx" btree (full_name) Contraintes de clés étrangères : "fk_cojxp4r7d8n2l135gy4xa4vak" FOREIGN KEY (active_company) REFERENCES contact_company(id) "fk_cqgrw75h35ts19uixn03rkjsu" FOREIGN KEY (internal_user) REFERENCES auth_user(id) "fk_k3riohsx7jrhxkxdmxyeqflq1" FOREIGN KEY (updated_by) REFERENCES auth_user(id) "fk_r3e16hs6puibteaby3rk42yg0" FOREIGN KEY (created_by) REFERENCES auth_user(id) "fk_t389sdkhi9owy0xbhec2nqp5w" FOREIGN KEY (agency) REFERENCES contact_agency(id) coopener=# \d+ contact_address Table « public.contact_address » Colonne | Type | Modificateurs | Stockage | Cible de statistiques | Description ----------------------+-----------------------------+---------------+----------+-----------------------+------------- id | bigint | non NULL | plain | | archived | boolean | | plain | | version | integer | | plain | | created_on | timestamp without time zone | | plain | | updated_on | timestamp without time zone | | plain | | addressl2 | character varying(255) | | extended | | addressl3 | character varying(255) | | extended | | addressl4 | character varying(255) | | extended | | addressl5 | character varying(255) | | extended | | addressl6 | character varying(255) | | extended | | certified_ok | boolean | | plain | | consumption_place_ok | boolean | | plain | | full_name | character varying(255) | | extended | | insee_code | character varying(255) | | extended | | koala_id | character varying(255) | | extended | | created_by | bigint | | plain | | updated_by | bigint | | plain | | addressl7country | bigint | | plain | | commune | bigint | | plain | | Index : "contact_address_pkey" PRIMARY KEY, btree (id) "contact_address_address_l7_country_idx" btree (addressl7country) "contact_address_commune_idx" btree (commune) "contact_address_full_name_idx" btree (full_name) Contraintes de clés étrangères : "fk_4yx7nnewflhyjdm5tue5qntbg" FOREIGN KEY (commune) REFERENCES territory_commune(id) "fk_5lwaygtve0ol8ma53picsdef" FOREIGN KEY (addressl7country) REFERENCES territory_country(id) "fk_p9svu5ssynimpuu0is3j396lt" FOREIGN KEY (updated_by) REFERENCES auth_user(id) "fk_rm0lcgnys2n97ad62jkm53qlt" FOREIGN KEY (created_by) REFERENCES auth_user(id) Regards, Laurent
Josh Berkus | 16 Oct 22:06 2014

9.4 performance improvements test

All,

Thought I'd share some pgbench runs I did on two servers, one running
9.3 and one running 9.4.

A small (512MB) pgbench test didn't show much difference between the two:

9.3.5:

transaction type: TPC-B (sort of)
scaling factor: 200
query mode: simple
number of clients: 16
number of threads: 4
duration: 600 s
number of transactions actually processed: 7686217
latency average: 1.249 ms
tps = 12810.135226 (including connections establishing)
tps = 12810.277332 (excluding connections establishing)
statement latencies in milliseconds:
        0.001833        \set nbranches 1 * :scale
        0.000513        \set ntellers 10 * :scale
        0.000447        \set naccounts 100000 * :scale
        0.000597        \setrandom aid 1 :naccounts
        0.000585        \setrandom bid 1 :nbranches
        0.000506        \setrandom tid 1 :ntellers
        0.000507        \setrandom delta -5000 5000
        0.053684        BEGIN;
        0.161115        UPDATE pgbench_accounts SET abalance = abalance
+ :delta WHERE aid = :aid;
        0.143763        SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;
        0.168801        UPDATE pgbench_tellers SET tbalance = tbalance +
:delta WHERE tid = :tid;
        0.183900        UPDATE pgbench_branches SET bbalance = bbalance
+ :delta WHERE bid = :bid;
        0.137570        INSERT INTO pgbench_history (tid, bid, aid,
delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
        0.389587        END;

9.4b3:

transaction type: TPC-B (sort of)
scaling factor: 200
query mode: simple
number of clients: 16
number of threads: 4
duration: 600 s
number of transactions actually processed: 7822118
latency average: 1.227 ms
tps = 13036.312006 (including connections establishing)
tps = 13036.498067 (excluding connections establishing)
statement latencies in milliseconds:
        0.001817        \set nbranches 1 * :scale
        0.000506        \set ntellers 10 * :scale
        0.000439        \set naccounts 100000 * :scale
        0.000587        \setrandom aid 1 :naccounts
        0.000497        \setrandom bid 1 :nbranches
        0.000487        \setrandom tid 1 :ntellers
        0.000506        \setrandom delta -5000 5000
        0.053509        BEGIN;
        0.160929        UPDATE pgbench_accounts SET abalance = abalance
+ :delta WHERE aid = :aid;
        0.145014        SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;
        0.169506        UPDATE pgbench_tellers SET tbalance = tbalance +
:delta WHERE tid = :tid;
        0.188648        UPDATE pgbench_branches SET bbalance = bbalance
+ :delta WHERE bid = :bid;
        0.141014        INSERT INTO pgbench_history (tid, bid, aid,
delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
        0.358340        END;

However, on a big disk-bound database, 9.4 was 20% better throughput.
The database in this case is around 200GB, for a server with 128GB RAM:

9.3.5:

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10000
query mode: simple
number of clients: 16
number of threads: 4
duration: 3600 s
number of transactions actually processed: 1944320
latency average: 29.625 ms
tps = 539.675140 (including connections establishing)
tps = 539.677426 (excluding connections establishing)

9.4b3:

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10000
query mode: simple
number of clients: 16
number of threads: 4
duration: 3600 s
number of transactions actually processed: 2422502
latency average: 23.777 ms
tps = 672.816239 (including connections establishing)
tps = 672.821433 (excluding connections establishing)

I suspect this is due to the improvements in writing less to WAL.  If
so, good work, guys!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Gmane