eudald_v | 30 Jun 16:52 2015
Picon

Sudden connection and load average spikes with postgresql 9.3

Hello all,
This is my very first message to the Postgresql community, and I really hope
you can help me solve the trouble I'm facing.

I've an 80 core server (multithread) with close to 500GB RAM.

My configuration is:
MaxConn: 1500 (was 850)
Shared buffers: 188Gb
work_mem: 110Mb (was 220Mb)
maintenance_work_mem: 256Mb
effective_cache_size: 340Gb

The database is running under postgresql 9.3.9 on an Ubuntu Server 14.04 LTS
(build 3.13.0-55-generic)

Two days from now, I've been experiencing that, randomly, the connections
rise up till they reach max connections, and the load average of the server
goes arround 300~400, making every command issued on the server take
forever. When this happens, ram is relatively low (70Gb used), cores
activity is lower than usual and sometimes swap happens (I've swappiness
configured to 10%)

I've been trying to find the cause of this server underperformance, even
logging all queries in debug mode, but nothing strange found so far.

I really don't know which can be my next step to try to isolate the problem
and that's why I write to you guys. Have you ever seen this behaviour
before?
Could you kindly help me suggesting any step to follow?
(Continue reading)

Josh Berkus | 27 Jun 21:29 2015

Does anyone have python code which digests pgbench -r output?

All:

Does anyone have python code which parses pgbench -r output for
statistical analysis?  I dug through pgbench-tools, but its reading of
pgbench results isn't easily separable from the rest.

I'll write a lib, but if someone already has one, it would save me some
time in developing a new public benchmark.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Sheena, Prabhjot | 24 Jun 20:09 2015

pgbouncer issue

Guys can anyone please explain or point me to a link where i can understand this output for pgbouncer. What does each column of this table mean?

 

 

pgbouncer=# show mem;

 

     name     | size | used | free | memtotal

--------------+------+------+------+----------

 user_cache   |  184 |   12 |   77 |    16376

 db_cache     |  160 |    2 |  100 |    16320

 pool_cache   |  408 |    4 |   46 |    20400

 server_cache |  360 |  121 |  279 |   144000

 client_cache |  360 | 1309 |  291 |   576000

 iobuf_cache  | 2064 |    3 |  797 |  1651200

 

 

 

Thanks

Prabhjot

 

 

 

Ian Pushee | 19 Jun 16:34 2015

Slow query (planner insisting on using 'external merge' sort type)

Hi Folks,

This is my first time posting here, so hopefully I manage to convey all 
the information needed.
We have a simple query that just started giving us problems in 
production when the number of rows gets too large (>100k).
The issue seems to be that the planner wants to sort the rows using a 
sequential scan, rather than the index provided specifically for this 
query. This isn't a problem with low numbers of rows, but eventually the 
query outgrows work_mem and uses the disk, slowing does the query 
greatly. I know the common answer is to increase work_mem... but since 
this tables growth is unpredictable, that isn't a viable strategy.
I've tried increasing shared_buffers and effective_cache_size, but that 
doesn't appear to effect the plan chosen here. Setting 
random_page_cost=1.0 works, but I'm hoping for a more general solution 
that doesn't require setting that locally each time I run the query. I 
guess my real question is wether or not there is any way to get the 
planner to take into account the fact that it's going to need to do an 
'external merge', and that it is going to take a LONG time?

Table and Index Schemas:
CREATE TABLE events
(
   id serial NOT NULL,
   name character varying(64),
   eventspy_id integer NOT NULL,
   camera_id integer NOT NULL,
   start_time timestamp without time zone NOT NULL,
   millisecond smallint NOT NULL,
   uid smallint NOT NULL,
   update_time timestamp without time zone NOT NULL DEFAULT now(),
   length integer NOT NULL,
   objects text NOT NULL,
   priority smallint NOT NULL,
   type character varying(45) NOT NULL DEFAULT 'alarm'::character varying,
   status event_status NOT NULL DEFAULT 'new'::event_status,
   confidence smallint NOT NULL DEFAULT 100::smallint,
   CONSTRAINT events_pkey PRIMARY KEY (id)
)
WITH (
   OIDS=FALSE
);

CREATE INDEX events_confidnce
   ON events
   USING btree
   (confidence);

CREATE INDEX events_summary
   ON events
   USING btree
   (name COLLATE pg_catalog."default", eventspy_id, camera_id, type 
COLLATE pg_catalog."default", status);

Query:
SELECT name, type, eventspy_id, camera_id, status, COUNT(id), 
MAX(update_time), MIN(start_time), MAX(start_time) FROM events WHERE 
confidence>=0 GROUP BY name, eventspy_id, camera_id, type, status;

Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

Thanks for any help you can provide,
-Ian Pushee

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Duane Murphy | 18 Jun 21:38 2015
Picon

Techniques to Avoid Temp Files

We are trying to improve performance by avoiding the temp file creation.

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
FROM installed_item__item_detail AS iiid
INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

Our hypothesis is that the temp file creation is caused by the high row count of the
installed_item__item_detail table.

installed_item__item_detail: 72916824 rows (27 GB)
item_detail:                 59212436 rows (40 GB)

The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.

What are the causes of temp file creation? In general, temp files are created when the sort merge data will not fit in work_mem. What can I do to reduce the amount of data that is being merged? Is the simple fact that the tables have millions of rows going to cause a merge sort?

I noticed that this query selects from installed_item__item_detail instead of from item_detail which seems like it would also work. Would this change make a positive difference?

installed_item__item_detail is a simple join table. The installed_item__id side cannot be reduced.  Would reducing the number of item_detail rows using additional joins benefit?

What additional information can I gather in order have a better understanding of how to improve this query?

(Unfortunately we do not have (easy) access to this particular database in order to experiment.)

 ...Duane

Background information:

=> select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

$uname -a
Linux host.name.com 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

=> select name, current_setting(name), source from pg_settings where source not in ('default', 'override');
             name             |  current_setting   |        source
------------------------------+--------------------+----------------------
 application_name             | psql               | client
 checkpoint_completion_target | 0.9                | configuration file
 checkpoint_segments          | 128                | configuration file
 client_encoding              | UTF8               | client
 DateStyle                    | ISO, MDY           | configuration file
 default_statistics_target    | 100                | configuration file
 default_text_search_config   | pg_catalog.english | configuration file
 effective_cache_size         | 512MB              | configuration file
 lc_messages                  | en_US.UTF-8        | configuration file
 lc_monetary                  | en_US.UTF-8        | configuration file
 lc_numeric                   | en_US.UTF-8        | configuration file
 lc_time                      | en_US.UTF-8        | configuration file
 log_autovacuum_min_duration  | 1s                 | configuration file
 log_destination              | stderr,syslog      | configuration file
 log_line_prefix              | [%m]:              | configuration file
 log_min_duration_statement   | 5min               | configuration file
 log_min_error_statement      | notice             | configuration file
 log_rotation_age             | 1d                 | configuration file
 log_rotation_size            | 0                  | configuration file
 log_temp_files               | 1MB                | configuration file
 log_timezone                 | US/Pacific         | environment variable
 log_truncate_on_rotation     | on                 | configuration file
 logging_collector            | on                 | configuration file
 maintenance_work_mem         | 384MB              | configuration file
 max_connections              | 100                | configuration file
 max_stack_depth              | 2MB                | environment variable
 port                         | 5432               | command line
 shared_buffers               | 256MB              | configuration file
 syslog_facility              | local0             | configuration file
 TimeZone                     | US/Pacific         | environment variable
 wal_buffers                  | 1MB                | configuration file
 work_mem                     | 128MB              | configuration file
(32 rows)

Irineu Ruiz | 18 Jun 19:52 2015
Picon

How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but not the table owner of the foreign key, the planner chooses a seq scan instead of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index is used.

In the first case, querys with seq scan tahe about 30 seconds and with the index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1



Sheena, Prabhjot | 18 Jun 19:09 2015

PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

Guys

         I have an issue going on with PGBOUNCER which is slowing down the site

 

PGBOUNCER VERSION:  pgbouncer-1.5.4-2.el6 (Hosted on separate machine) (16 cpu) 98GB RAM

DATABASE VERION:         postgresql 9.3

 

When the total client connections to pgbouncer are close to 1000, site application works fine but when  the total client connections crosses 1150 site application starts showing slowness.

 

Here is an example of output

 

postgres <at> symds-pg:~ $ netstat -atnp | grep 5432  | wc

(Not all processes could be identified, non-owned process info

will not be shown, you would have to be root to see it all.)

    960    6720  104640

 

 

As you can see total connections are like 960 right now my site application is working fine. When connections crosses 1150  and even though I see lot of available connections coz my  default_pool_size is set high to 250  but still the application gets slow. Database performance  on the other end is great with no slow running queries or anything. So the only place I can think the issue is at PGBOUNCER end.

 

pgbouncer=# show config;

            key            |              value               | changeable

---------------------------+----------------------------------+------------

job_name                  | pgbouncer                        | no

conffile                  | /etc/pgbouncer/pgbouncer.ini     | yes

logfile                   | /var/log/pgbouncer.log    | yes

pidfile                   | /var/run/pgbouncer/pgbouncer.pid | no

listen_addr               | *                                | no

listen_port               | 5432                             | no

listen_backlog            | 128                              | no

unix_socket_dir           | /tmp                             | no

unix_socket_mode          | 511                              | no

unix_socket_group         |                                  | no

auth_type                 | md5                              | yes

auth_file                 | /etc/pgbouncer/userlist.txt      | yes

pool_mode                 | transaction                      | yes

max_client_conn           | 3000                             | yes

default_pool_size         | 250                              | yes

min_pool_size             | 0                                | yes

reserve_pool_size         | 0                                | yes

reserve_pool_timeout      | 5                                | yes

syslog                    | 0                                | yes

syslog_facility           | daemon                           | yes

syslog_ident              | pgbouncer                        | yes

user                      |                                  | no

autodb_idle_timeout       | 3600                             | yes

server_reset_query        |                                  | yes

server_check_query        | select 1                         | yes

server_check_delay        | 30                               | yes

query_timeout             | 0                                | yes

query_wait_timeout        | 0                                | yes

client_idle_timeout       | 0                                | yes

client_login_timeout      | 60                               | yes

idle_transaction_timeout  | 0                                | yes

server_lifetime           | 3600                             | yes

server_idle_timeout       | 600                              | yes

server_connect_timeout    | 15                               | yes

server_login_retry        | 15                               | yes

server_round_robin        | 0                                | yes

suspend_timeout           | 10                               | yes

ignore_startup_parameters | extra_float_digits               | yes

disable_pqexec            | 0                                | no

dns_max_ttl               | 15                               | yes

dns_zone_check_period     | 0                                | yes

max_packet_size           | 2147483647                       | yes

pkt_buf                   | 2048                             | no

sbuf_loopcnt              | 5                                | yes

tcp_defer_accept          | 1                                | yes

tcp_socket_buffer         | 0                                | yes

tcp_keepalive             | 1                                | yes

tcp_keepcnt               | 0                                | yes

tcp_keepidle              | 0                                | yes

tcp_keepintvl             | 0                                | yes

verbose                   | 0                                | yes

admin_users               | postgres                         | yes

stats_users               | stats, postgres                  | yes

stats_period              | 60                               | yes

log_connections           | 1                                | yes

log_disconnections        | 1                                | yes

log_pooler_errors         | 1                                | yes

 

 

Thanks

Prabhjot

 

Joshua D. Drake | 13 Jun 19:43 2015

Re: Do work_mem and shared buffers have 1g or 2g limit on 64 bit linux?


On 06/13/2015 10:27 AM, Kaijiang Chen wrote:
> Hi, I am using postgresql 9.2.10 on centos 6.2, 64 bit version. The
> server has 512 GB mem.
>
> The jobs are mainly OLAP like. So I need larger work_mem and shared
> buffers. From the source code, there is a constant MaxAllocSize==1GB.
> So, I wonder whether work_mem and shared buffers can exceed 2GB in the
> 64 bit Linux server?

Shared Buffers is not limited.

Work_mem IIRC can go past 2GB but has never been proven to be effective 
after that.

It does depend on the version you are running.

JD

>
> Thanks and regards,
> Kaijiang
>

-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Kaijiang Chen | 13 Jun 19:27 2015
Picon

Do work_mem and shared buffers have 1g or 2g limit on 64 bit linux?

Hi, I am using postgresql 9.2.10 on centos 6.2, 64 bit version. The server has 512 GB mem.


The jobs are mainly OLAP like. So I need larger work_mem and shared buffers. From the source code, there is a constant MaxAllocSize==1GB. So, I wonder whether work_mem and shared buffers can exceed 2GB in the 64 bit Linux server?

Thanks and regards,
Kaijiang

Michael Nolan | 12 Jun 22:37 2015
Picon

Are there tuning parameters that don't take effect immediately?

Last night I was doing some tuning on a database  The longest query I was running was taking around 160 seconds.  I didn't see much change in the running time for that query, even after restarting PG.

Today, with roughly the same system load (possibly even a bit heavier load), that query is running about 40 seconds. 

Are there tuning parameters in postgresql.conf that don't take effect right away, even after a restart of PG?  The only thing I can come up that's happened since last night was that we ran the nightly vacuum analyze on that database, but I did not change the statistics target. 

The parameters I was working with were:

effective_cache_size
shared_buffers
temp_buffers
work_mem
maintenance_work_mem

Looking at the free command, I see a lot more memory being used for buffer/cache today.  (Centos 7.)
--
Mike Nolan
Sasa Vilic | 12 Jun 02:18 2015
Picon

Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated

Hi,

I have a query that takes ridiculously long to complete (over 500ms) but 
if I disable nested loop it does it really fast (24.5ms)

Here are links for
* first request (everything enabled): http://explain.depesz.com/s/Q1M
* second request (nested loop disabled): http://explain.depesz.com/s/9ZY

I have also noticed, that setting

set join_collapse_limit = 1;

produces similar results as when nested loops are disabled.

Autovacuumm is running, and I did manually performed both: analyze and 
vacuumm analyze. No effect.

I tried increasing statistics for columns (slot, path_id, key) to 5000 
for table data. No effect.

I tried increasing statistics for columns (id, parent, key) to 5000 for 
table path. No effect.

I can see, that postgres is doing wrong estimation on request count, but 
I can't figure it out why.

Table path is used to represent tree-like structure.

== QUERY ==

SELECT p1.value as request_type, p2.value as app_id, p3.value as app_ip, 
p3.id as id, data.*, server.name
FROM data
INNER JOIN path p3 ON data.path_id = p3.id
INNER JOIN server on data.server_id = server.id
INNER JOIN path p2 on p2.id = p3.parent
INNER JOIN path p1 on p1.id = p2.parent
WHERE data.slot between '2015-02-18 00:00:00' and '2015-02-19 00:00:00'
     AND p1.key = 'request_type' AND p2.key = 'app_id' AND p3.key = 'app_ip'
;

== TABLES ==
                                              Table "public.path"
  Column |         Type          | Modifiers                     | 
Storage  | Description
--------+-----------------------+---------------------------------------------------+----------+-------------
  id     | integer               | not null default 
nextval('path_id_seq'::regclass) | plain    |
  parent | integer |                                                   | 
plain    |
  key    | character varying(25) | not 
null                                          | extended |
  value  | character varying(50) | not 
null                                          | extended |
Indexes:
     "path_pkey" PRIMARY KEY, btree (id)
     "path_unique" UNIQUE CONSTRAINT, btree (parent, key, value)
Foreign-key constraints:
     "path.fg.parent->path(id)" FOREIGN KEY (parent) REFERENCES path(id)
Referenced by:
     TABLE "data" CONSTRAINT "data_fkey_path" FOREIGN KEY (path_id) 
REFERENCES path(id)
     TABLE "path" CONSTRAINT "path.fg.parent->path(id)" FOREIGN KEY 
(parent) REFERENCES path(id)
Has OIDs: no

                                Table "public.data"
   Column   |              Type              | Modifiers | Storage  | 
Description
-----------+--------------------------------+-----------+----------+-------------
  slot      | timestamp(0) without time zone | not null  | plain    |
  server_id | integer                        | not null  | plain    |
  path_id   | integer                        | not null  | plain    |
  key       | character varying(50)          | not null  | extended |
  value     | real                           | not null  | plain    |
Indexes:
     "data_pkey" PRIMARY KEY, btree (slot, server_id, path_id, key)
Foreign-key constraints:
     "data_fkey_path" FOREIGN KEY (path_id) REFERENCES path(id)
Has OIDs: no

svilic=> select count(*) from path;
  count
-------
    603

svilic=> select count(*) from path p1 inner join path p2 on p1.id = 
p2.parent inner join path p3 on p2.id = p3.parent where p1.parent is null;
  count
-------
    463

svilic=> select count(*) from server;
  count
-------
     37

svilic=> select count(*) from data;
   count
----------
  23495552

svilic=> select version();
version
-------------------------------------------------------------------------------------------------------------
  PostgreSQL 9.1.17 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

== SERVER CONFIGURATION ==

shared_buffers = 512MB
work_mem = 8MB (I have tried changing it to 32, 128 and 512, no effect)
maintenance_work_mem = 64MB
checkpoint_segments = 100
random_page_cost = 4.0
effective_cache_size = 3072MB

== HARDWARE CONFIGURATION ==

cpu: Intel(R) Core(TM) i3-2100 CPU  <at>  3.10GHz (4 cores)
mem: 8GB
system is using regular disks, (no raid and no ssd)

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Gmane