Ravi Krishna | 28 May 16:31 2015
Picon

Partitioning and performance

I am testing partitioning of a large table. I am doing a range
partitioning based on a sequence col, which also acts as the primary
key. For inserts I am using a trigger which will redirect insert to
the right table based on the value of the primary key.

Based on my testing, I see that the insert speed is less than 10%
different than a non partitioned table. I am using  SET
constraint_exclusion = on and I checked that via ANALYZE that the
planner does not consider non qualifying child tables.

yet, selects and updates based on the primary key show anywhere from
40 to 200% slowness as compared to non partition. One thing I notice
is that, even with partition pruning, the planner scans the base table
and the table matching the condition. Is that the additional overhead.

I am attaching below the output of analyze.

===========================
On a non partitioned table

explain select count(*) from tstesting.account where account_row_inst = 101 ;
Aggregate (cost=8.16..8.17 rows=1 width=0)
-> Index Only Scan using account_pkey on account (cost=0.14..8.16
rows=1 width=0)
Index Cond: (account_row_inst = 101)
(3 rows)

With partition pruning:

Aggregate (cost=8.45..8.46 rows=1 width=0)
(Continue reading)

Josh Berkus | 28 May 00:02 2015

Re: ERROR: missing chunk number 0 for toast value 1821556134 in pg_toast_17881

On 05/27/2015 12:50 AM, Tory M Blue wrote:
> Greetings and salutations.
> 
> I've got some weirdness. 
> 
> Current:
> Postgres 9.3.4 
> Slony 2.2.3
> CentOS 6.5
> 
> Prior running Postgres 9.1.2 w/slony 2.1.3 CentOS 6.2
> 
> I found that if I tried to run a vacuum full on 1 table that I recently
> reindexed (out of possibly 8 tables) that I get this error:
> 
> # vacuum full table.ads;
> 
> ERROR:  missing chunk number 0 for toast value 1821556134 in pg_toast_17881
> 
> If I run a vacuum analyze it completes fine, but I can't run a vacuum
> full without it throwing an error. I seem to be able to query the table
> and I seem to be able to add data to the table and slony seems fine as
> does postgres. 

Is this happening on the Slony master or the replica?

There have been serveral bugs since 9.1.2 which could have caused this
particular error.  Are you certain that this is a recent problem?

Note that this error affects just one compressed value or row, so you're
(Continue reading)

Wes Vaske (wvaske | 27 May 22:24 2015

Fastest Backup & Restore for perf testing

Hi,

 

I’m running performance tests against a PostgreSQL database (9.4) with various hardware configurations and a couple different benchmarks (TPC-C & TPC-H).

 

I’m currently using pg_dump and pg_restore to refresh my dataset between runs but this process seems slower than it could be.

 

Is it possible to do a tar/untar of the entire /var/lib/pgsql tree as a backup & restore method?

 

If not, is there another way to restore a dataset more quickly? The database is dedicated to the test dataset so trashing & rebuilding the entire application/OS/anything is no issue for me—there’s no data for me to lose.

 

Thanks!

 

Wes Vaske | Senior Storage Solutions Engineer

Micron Technology

101 West Louis Henna Blvd, Suite 210 | Austin, TX 78728

 

Tory M Blue | 27 May 09:50 2015
Picon

ERROR: missing chunk number 0 for toast value 1821556134 in pg_toast_17881

Greetings and salutations.

I've got some weirdness. 

Current:
Postgres 9.3.4 
Slony 2.2.3
CentOS 6.5

Prior running Postgres 9.1.2 w/slony 2.1.3 CentOS 6.2

I found that if I tried to run a vacuum full on 1 table that I recently reindexed (out of possibly 8 tables) that I get this error:

# vacuum full table.ads;

ERROR:  missing chunk number 0 for toast value 1821556134 in pg_toast_17881

If I run a vacuum analyze it completes fine, but I can't run a vacuum full without it throwing an error. I seem to be able to query the table and I seem to be able to add data to the table and slony seems fine as does postgres. 

I'm unclear why the vacuum full is failing with this error. I've done some searching and there are hints to prior bugs, but I didn't catch anything in 9.3.3 to 9.3.7 that talks about this.

My next steps without your fine assistance, will be to drop the table from slon and re-add it (meaning it will drop the table completely from this db and recreate it from the master (there we can do a vacuum full without failure)..

I have already tried to remove the indexes and just create those, but no luck. 

Ideas?

Thanks

Tory



Dave Johansen | 23 May 00:27 2015
Picon

MAX() and multi-column index on a partitioned table?

I'm trying to call MAX() on the first value of a multi-column index of a partitioned table and the planner is choosing to do a sequential scan instead of an index scan. Is there something I can do to fix this?

Here's a simplified version of our schema:
CREATE TABLE data ( tutci DOUBLE PRECISION, tutcf DOUBLE PRECISION, value INTEGER );
CREATE TABLE data1 ( CHECK ( tutci >= 1000 AND tutci < 2000 ) ) INHERITS (data);
CREATE TABLE data2 ( CHECK ( tutci >= 2000 AND tutci < 3000 ) ) INHERITS (data);
With the following indexes:
CREATE INDEX data_tutc_index ON data(tutci, tutcf);
CREATE INDEX data1_tutc_index ON data1(tutci, tutcf);
CREATE INDEX data2_tutc_index ON data2(tutci, tutcf);

No data is stored in the parent table (only in the partitions) and the explain is as follows after doing a CLUSTER on the index and a VACUUM ANALYZE after populating with simple test data:
EXPLAIN SELECT MAX(tutci) FROM data;
                                 QUERY PLAN                                
----------------------------------------------------------------------------
 Aggregate  (cost=408.53..408.54 rows=1 width=8)
   ->  Append  (cost=0.00..354.42 rows=21642 width=8)
         ->  Seq Scan on data  (cost=0.00..26.30 rows=1630 width=8)
         ->  Seq Scan on data1 data  (cost=0.00..164.11 rows=10011 width=8)
         ->  Seq Scan on data2 data  (cost=0.00..164.01 rows=10001 width=8)

Thanks,
Dave
Josh Berkus | 22 May 19:34 2015

Re: PostgreSQL disk fragmentation causes performance problems on Windows

On 05/21/2015 01:39 PM, Andres Freund wrote:
> On 2015-05-21 11:54:40 -0700, Josh Berkus wrote:
>> This has been talked about as a feature, but would require major work on
>> PostgreSQL to make it possible.  You'd be looking at several months of
>> effort by a really good hacker, and then a whole bunch of performance
>> testing.  If you have the budget for this, then please let's talk about
>> it because right now nobody is working on it.
> 
> I think this is overestimating the required effort quite a bit. While
> not trivial, it's also not that complex to make this work.

Well, then, maybe someone should hack and test it ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Josh Berkus | 21 May 20:54 2015

Re: PostgreSQL disk fragmentation causes performance problems on Windows

On 04/23/2015 12:47 PM, Jan Gunnar Dyrset wrote:
> I think that preallocating lumps of a given, configurable size, say 4
> MB, for the tables would remove this problem. The max number of
> fragments on a 1 GB file would then be  250, which is no problem. Is
> this possible to configure in PostgreSQL? If not, how difficult is it to
> implement in the database?

It is not currently possible to configure.

This has been talked about as a feature, but would require major work on
PostgreSQL to make it possible.  You'd be looking at several months of
effort by a really good hacker, and then a whole bunch of performance
testing.  If you have the budget for this, then please let's talk about
it because right now nobody is working on it.

Note that this could be a dead end; it's possible that preallocating
large extents could cause worse problems than the current fragmentation
issues.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Florian Lohoff | 21 May 12:41 2015
Picon

union all and filter / index scan -> seq scan


Hi,
i stumbled over something i cant seem to find a workaround. I create a view like

create view v_test as
	select	a,b
	from	big_table
	union all
	select	a,b
	from	small_table;

When i now use the view like

	select * from v_test where a = 42;

I can see an index scan happening on big_table. When i issue
something like 

	select * from v_test where a in ( select 42 );

or joining to another table i see that there will be seq scan on big
table. First the union will be executed and later the filter e.g. a in (
select 42 ) will be done on the huge result. My use case is that
big_table is >70mio entries growing fast and small_table is like 4
entries, growing little.  The filter e.g. "a in ( select 42 )" will
typically select 50-1000 entries of the 70mio. So i now create a union
with 70mio + 4 entries to then filter all with a = 42.

It seems the planner is not able to rewrite a union all e.g. the above
statement could be rewritten from:

	select	*
	from	(
		select	a,b
		from	big_table
		union all
		select	a,b
		from	small_table;
		) foo
	where	a in ( select 42 );

to 

	select	*
	from	(
		select	a,b
		from	big_table
		where a in ( select 42 )
		union all
		select	a,b
		from	small_table
		where a in ( select 42 )
		) foo

which would then use an index scan not a seq scan and execution times
would be acceptable. 

I have now tried to wrap my head around the problem for 2 days and i am 
unable to find a workaround to using a union but the filter optimisation
is impossible with a view construct.

Flo
PS: Postgres 9.1 - I tried 9.4 on Debian/jessie with IIRC same results.
--

-- 
Florian Lohoff                                                 f <at> zz.de
     We need to self-defense - GnuPG/PGP enable your email today!
Muthusamy, Sivaraman | 11 May 11:55 2015
Picon

How to clean/truncate / VACUUM FULL pg_largeobject without (much) downtime?

Hi Group,

 

Facing a problem where pg_catalog.pg_largetobject has been growing fast recently, in last two weeks. The actual data itself, in user tables, is about 60GB, but pg_catalog.pg_largeobject table is 200GB plues. Please let me know how to clean/truncate this table without losing any user data in other table.

 

With regards to this pg_largeobject, I have the following questions:

 

-      What is this pg_largetobject ?

-      what does it contain ? tried PostgreSQL documentation and lists, but could not get much from it.

-      why does it grow ?

-      Was there any configuration change that may have triggered this to grow? For last one year or so, there was no problem, but it started growing all of sudden in last two weeks. The only change we had in last two weeks was that we have scheduled night base-backup for it and auto-vacuum feature enabled.

-      pg_largeobject contains so many duplicate rows (loid). Though there are only about 0.6 million rows (LOIDs), but the total number of rows including duplicates are about 59million records. What are all these ?

 

Kindly help getting this information and getting this issue cleared, and appreciate your quick help on this.

 

Thanks and Regards

M.Shiva

 

 

Anton Bushmelev | 3 May 23:23 2015
Picon

optimization join on random value

Hello guru of postgres,  it's possoble to tune query with join on random string ?
i know that it is not real life example, but i need it for tests.

soe=# explain
soe-#  SELECT   ADDRESS_ID,       
soe-#           CUSTOMER_ID,       
soe-#           DATE_CREATED,       
soe-#           HOUSE_NO_OR_NAME,       
soe-#           STREET_NAME,       
soe-#           TOWN,       
soe-#           COUNTY,       
soe-#           COUNTRY,       
soe-#           POST_CODE,       
soe-#           ZIP_CODE       
soe-#         FROM ADDRESSES      
soe-#         WHERE customer_id = trunc( random()*45000) ;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Seq Scan on addresses  (cost=0.00..165714.00 rows=22500 width=84)
   Filter: ((customer_id)::double precision = trunc((random() * 45000::double precision)))
(2 rows)

soe=# \d addresses;
soe=# \d addresses;
                  Table "public.addresses"                                                                                       
      Column      |            Type             | Modifiers                                                                      
------------------+-----------------------------+-----------                                                                     
 address_id       | bigint                      | not null                                                                       
 customer_id      | bigint                      | not null                                                                       
 date_created     | timestamp without time zone | not null                                                                       
 house_no_or_name | character varying(60)       |                                                                                
 street_name      | character varying(60)       |                                                                                
 town             | character varying(60)       |                                                                                
 county           | character varying(60)       |                                                                                
 country          | character varying(60)       |                                                                                
 post_code        | character varying(12)       |                                                                                
 zip_code         | character varying(12)       |                                                                                
Indexes:                                                                                                                         
    "addresses_pkey" PRIMARY KEY, btree (address_id)                                                                             
    "addresses_cust_ix" btree (customer_id)                                                                                      
Foreign-key constraints:                                                                                                         
    "add_cust_fk" FOREIGN KEY (customer_id) REFERENCES customers(customer_id) DEFERRABLE    



same query in oracle same query use index access path:

00:05:23 (1)c##bushmelev_aa <at> orcl> explain plan for
 SELECT   ADDRESS_ID,      
          CUSTOMER_ID,      
          DATE_CREATED,      
          HOUSE_NO_OR_NAME,      
          STREET_NAME,      
          TOWN,      
          COUNTY,      
          COUNTRY,      
          POST_CODE,      
          ZIP_CODE      
        FROM soe.ADDRESSES     
 WHERE customer_id = dbms_random.value ();

Explained.

Elapsed: 00:00:00.05
00:05:29 (1)c##bushmelev_aa <at> orcl> <at> utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 317664678

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |     2 |   150 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ADDRESSES       |     2 |   150 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ADDRESS_CUST_IX |     2 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CUSTOMER_ID"="DBMS_RANDOM"."VALUE"())


David Osborne | 1 May 12:54 2015
Picon

Index Scan Backward Slow

Hi,

We have a query which finds the latest row_id for a particular code.

We've found a backwards index scan is much slower than a forward one, to the extent that disabling indexscan altogether actually improves the query time.

Can anyone suggest why this might be, and what's best to do to improve the query time?



dev=> \d table
               Table "public.table"
    Column    |              Type              | Modifiers 
--------------+--------------------------------+-----------
 row_id       | integer                        | 
 code         | character(2)                   | 
Indexes:
    "table_code_idx" btree (code)
    "table_row_idx" btree (row_id)

dev=> select count(*) from table;
  count  
---------
 6090254
(1 row)

dev=> select count(distinct(row_id)) from table;
  count  
---------
 5421022
(1 row)

dev=> select n_distinct from pg_stats where tablename='table' and attname='row_id';
 n_distinct 
------------
  -0.762951
(1 row)

dev=> show work_mem;
 work_mem  
-----------
 1249105kB
(1 row)

dev=> select version();
                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2), 64-bit
(1 row)


The query in question:

dev=> explain (analyse,buffers)  select row_id as last_row_id from table where code='XX' order by row_id desc limit 1;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..1.67 rows=1 width=4) (actual time=835.281..835.282 rows=1 loops=1)
   Buffers: shared hit=187961
   ->  Index Scan Backward using table_row_idx on table  (cost=0.43..343741.98 rows=278731 width=4) (actual time=835.278..835.278 rows=1 loops=1)
         Filter: (code = 'XX'::bpchar)
         Rows Removed by Filter: 4050971
         Buffers: shared hit=187961
 Total runtime: 835.315 ms
(7 rows)



So we can see it's doing a backwards index scan. Out of curiosity I tried a forward scan and it was MUCH quicker:

dev=> explain (analyse,buffers)  select row_id as first_row_id from table where code='XX' order by row_id asc limit 1;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..1.67 rows=1 width=4) (actual time=19.473..19.474 rows=1 loops=1)
   Buffers: shared hit=26730
   ->  Index Scan using table_row_idx on table  (cost=0.43..343741.98 rows=278731 width=4) (actual time=19.470..19.470 rows=1 loops=1)
         Filter: (code = 'XX'::bpchar)
         Rows Removed by Filter: 62786
         Buffers: shared hit=26730
 Total runtime: 19.509 ms
(7 rows)



I thought adding a index on row_id desc might be the answer but it has little effect:

dev=> create index row_id_desc_idx on table(row_id desc);
CREATE INDEX
Time: 5293.812 ms

dev=> explain (analyse,buffers)  select row_id as last_row_id from table where code='XX' order by row_id desc limit 1;
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..1.66 rows=1 width=4) (actual time=944.666..944.667 rows=1 loops=1)
   Buffers: shared hit=176711 read=11071
   ->  Index Scan using row_id_desc_idx on table  (cost=0.43..342101.98 rows=278731 width=4) (actual time=944.663..944.663 rows=1 loops=1)
         Filter: (code = 'XX'::bpchar)
         Rows Removed by Filter: 4050971
         Buffers: shared hit=176711 read=11071
 Total runtime: 944.699 ms
(7 rows)


In fact, disabling the index scan completely improves matters considerably:

dev=> drop index row_id_desc_idx;
DROP INDEX
dev=> set enable_indexscan to off;
SET   

dev=> explain (analyse,buffers)  select row_id as last_row_id from table where code='XX' order by row_id desc limit 1;
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=74006.39..74006.39 rows=1 width=4) (actual time=183.997..183.998 rows=1 loops=1)
   Buffers: shared hit=14723
   ->  Sort  (cost=74006.39..74703.22 rows=278731 width=4) (actual time=183.995..183.995 rows=1 loops=1)
         Sort Key: row_id
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=14723
         ->  Bitmap Heap Scan on table  (cost=5276.60..72612.74 rows=278731 width=4) (actual time=25.533..119.320 rows=275909 loops=1)
               Recheck Cond: (code = 'XX'::bpchar)
               Buffers: shared hit=14723
               ->  Bitmap Index Scan on table_code_idx  (cost=0.00..5206.91 rows=278731 width=0) (actual time=23.298..23.298 rows=275909 loops=1)
                     Index Cond: (code = 'XX'::bpchar)
                     Buffers: shared hit=765
 Total runtime: 184.043 ms
(13 rows)


Thanks in advance for any help.

Regards,
--
David Osborne
Qcode Software Limited


Gmane