Laurence Parry | 20 Apr 06:30 2014
Picon

Workaround: Planner preference for tsquery filter vs. GIN index in fast text search

As mentioned here and elsewhere (most recently in "How can I get the query 
planner to use a bitmap index scap instead of an index scan ?" - 8 Mar 
2014), estimation of the relative cost of text search operations using 
GIN-indexed columns sometimes goes awry, particularly when there will be a 
large number of matches.

The planner may choose to use a sequential or unrelated index scan with  <at>  <at>  
as a filter, especially when incorporated as a subquery, incurring 
significant cost (even without considering de-TOASTing). Pre-tsvectorizing 
the column offers only a slight mitigation and can cause regressions (if 
nothing else, it adds another large column).

What worked for me (and I'm hoping for others, though YMMV) was adding 
'OFFSET 0' to the subquery involving the indexed column, e.g.

...
(SELECT sk1.submission_id
FROM submission_keywords sk1, keywords k1
WHERE sk1.keyword_id = k1.keyword_id
    AND
to_tsvector('english_nostop', k1.keyword)  <at>  <at>  to_tsquery('english_nostop', 
'tails')
OFFSET 0)
...

The result is a bitmap scan:
------------------------------------------------------------------------------------------
Nested Loop
(cost=8.73..4740.29 rows=21348 width=4)
(actual time=0.621..13.661 rows=20097 loops=1)
(Continue reading)

Gary Warner | 20 Apr 04:12 2014
Picon

IP addresses, NetBlocks, and ASNs

Does anyone have some good tricks for mapping IP addresses to ASN numbers in very large volumes?

This is probably more a "how would you approach this problem?" than "can you help me tweak this query" 

I have a very large number of IP addresses (many millions) that are found in some security related logs.  I
need a fast way to categorize these back to their ASN numbers and Country codes.  I have a table that was
generated from BGP routing files to create a list of netblocks and their corresponding ASNs.  For small
groups of IP addresses (hundreds or thousands) it has always been fast enough to easily just do something like:

select count(*), asn_number from logaddr a, ipv4 b where a.ip <<= b.net_block 

where logaddr has information about IP addresses encountered and
where "ipv4" has a list of mappings of which netblocks belong to which ASNs (yeah, it is 525,000+ Netblocks ...)

I'm trying to figure out if there is a better way to use the "cidr" netblock to speed up the look up and matching.

I've tried playing with going from ASN to the matching IP addresses, or the other way around ... for example,
here is the Explain  for looking at all the IP addresses in the logaddr (currently about 9 million records)
that use ASN = 2119:

explain select count(distinct ip), country_code, asn_number from ipv4 a, logaddr b 
where b.ip <<= a.net_block and a.asn_number = 2119 
group by country_code, asn_number;

"GroupAggregate  (cost=36851922.32..38215407.51 rows=4 width=18)"
"  ->  Sort  (cost=36851922.32..37192793.61 rows=136348515 width=18)"
"        Sort Key: a.country_code, a.asn_number"
"        ->  Nested Loop  (cost=0.00..4448316.05 rows=136348515 width=18)"
"              Join Filter: ((b.ip)::inet <<= (a.net_block)::inet)"
"              ->  Seq Scan on logaddr b  (cost=0.00..347394.01 rows=9089901 width=7)"
(Continue reading)

Ivan Voras | 20 Apr 01:15 2014
Picon

tsearch2, large data and indexes

Hello,

If a table contains simple fields as well as large (hundreds of KiB)
text fields, will accessing only the simple fields cause the entire
record data, including the large fields, to be read and unpacked?
(e.g. SELECT int_field FROM table_with_large_text)

More details: after thinking about it some more, it might have
something to do with tsearch2 and indexes: the large data in this case
is a tsvector, indexed with GIN, and the query plan involves a
re-check condition.

The query is of the form:
SELECT simple_fields FROM table WHERE fts  <at>  <at>  to_tsquery('...').

Does the "re-check condition" mean that the original tsvector data is
always read from the table in addition to the index? That would be
very wasteful since data is practically duplicated in the table and in
the index. Any way around it?

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Hot standby 9.2.1 PANIC: WAL contains references to invalid pages

Hi Team,

For last 2 days we are facing issue with replication.

WARNING:  page 21 of relation base/1193555/19384612 does not exist
CONTEXT:  xlog redo insert: rel 1663/1193555/19384612; tid 21/1
PANIC:  WAL contains references to invalid pages
CONTEXT:  xlog redo insert: rel 1663/1193555/19384612; tid 21/1
LOG:  startup process (PID 20622) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

Stand by server went down with this error.

we just using warm stand by, but we enabled wal_level as 'hot_stanndby' in Master server. 

I just read this mailing list, and in postgres 9.2.7 we have fix,

But as of now,
if i change the wal level as archive, then this problem will go..? We are just using warm stand by. so shall we change the wal_level as archive..? Can you please reply this mail as soon as possible? 
--
Best Regards,
Vishalakshi.N

Franck Routier | 17 Apr 17:11 2014

Fast distinct not working as expected

Hi,

we are using a mono-column index on a huge table to try to make a quick
'select distinct ' on the column.

This used to work fine, but... it does not anymore. We don't know what
happened.

Here are the facts:

- request:
SELECT  dwhinv___rfovsnide::varchar FROM dwhinv WHERE dwhinv___rfovsnide
> '201212_cloture' ORDER BY dwhinv___rfovsnide LIMIT 1

- Plan :
Limit  (cost=0.00..1.13 rows=1 width=12) (actual time=5798.915..5798.916
rows=1 loops=1)
  ->  Index Scan using vsn_idx on dwhinv  (cost=0.00..302591122.05
rows=267473826 width=12) (actual time=5798.912..5798.912 rows=1 loops=1)
        Index Cond: ((dwhinv___rfovsnide)::text > '201212_cloture'::text)
Total runtime: 5799.141 ms

- default_statistics_target = 200;

- postgresql Version 8.4

- Index used :
CREATE INDEX vsn_idx
  ON dwhinv
  USING btree
  (dwhinv___rfovsnide);

There are 26 distinct values of the column.
This query used to take some milliseconds at most. The index has been
freshly recreated.

What could be the problem ?

Franck

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Linos | 16 Apr 17:13 2014
Picon

unneeded joins on view

Hello all,

I am trying to simplify some of the queries I use with my database creating a big view of all the possible
attributes my items can have, the view is rather large:

http://pastebin.com/ScnJ8Hd3

I thought that Postgresql would optimize out joins on columns I don't ask for when I use the view but it
doesn't, this query:

SELECT referencia
FROM articulo_view
WHERE referencia = '09411000';

Have this query plan:

http://explain.depesz.com/s/4lW0

Maybe I am surpassing some limit? I have tried changing from_collapse_limit and join_collapse_limit but
still the planner join the unneeded tables.

Is possible to tell Postgresql do the right thing? If so, how? Thanks!

Regards,
Miguel Angel.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

sheishere b | 16 Apr 15:05 2014
Picon

Queries very slow after data size increases

Following are the tables

---------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE  equipment (
  contract_nr varchar(32) COLLATE "C" NULL DEFAULT NULL,
  name varchar(64) COLLATE "C" DEFAULT '',
  latitude numeric(10,7) NOT NULL,
  longitude numeric(10,7) NOT NULL,
  mac_addr_w varchar(17) COLLATE "C" NOT NULL,
  mac_addr_wl varchar(17) COLLATE "C" NOT NULL,
  identifier varchar(32) COLLATE "C" NOT NULL,
  he_identifier varchar(17) COLLATE "C" DEFAULT '',
  number_of_wlans integer NOT NULL DEFAULT '1' ,
  regions varchar(64) COLLATE "C" DEFAULT '',
  external_id varchar(64) COLLATE "C",   
  PRIMARY KEY (external_id)
) ;

CREATE INDEX equipment_mac_addr_w_idx ON equipment (mac_addr_w);
CREATE INDEX equipment_latitude_idx ON equipment (latitude);
CREATE INDEX equipment_longitude_idx ON equipment (longitude);

no of rows - 15000

---------------------------------------------------------------------------------------------------------------------------------------------------------------

create table accounting (
  equip_wan varchar(17) COLLATE "C" NOT NULL,
  equip_wlan varchar(17) COLLATE "C" NOT NULL,
  identifier varchar(32) COLLATE "C" NOT NULL,
  he_identifier varchar(17) COLLATE "C" NULL DEFAULT NULL,
  time_stamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
  in_oc bigint NOT NULL DEFAULT 0,
  out_oc bigint NOT NULL DEFAULT 0
  );

CREATE INDEX accounting_time_stamp_idx ON accounting (time_stamp);
CREATE INDEX accounting_equip_wan_idx ON accounting (equip_wan);

no of rows - 36699300
This table is growing rapidly

---------------------------------------------------------------------------------------------------------------------------------------------------------------

create table accounting_fifteenminute_aggregate (
  equip_wan varchar(17) COLLATE "C" NOT NULL,
  identifier varchar(32) COLLATE "C" NOT NULL,
  time_stamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,  
  in_oc bigint NOT NULL DEFAULT 0,
  out_oc bigint NOT NULL DEFAULT 0
  );

CREATE INDEX accounting_15min_agg_timestamp_idx ON accounting_fifteenminute_aggregate (time_stamp);
CREATE INDEX accounting_15min_agg_equip_wan_idx ON accounting_fifteenminute_aggregate (equip_wan);

no of rows - 4800000

---------------------------------------------------------------------------------------------------------------------------------------------------------------

create table accounting_hourly_aggregate (
  equip_wan varchar(17) COLLATE "C" NOT NULL,
  identifier varchar(32) COLLATE "C" NOT NULL,
  time_stamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,  
  in_oc bigint NOT NULL DEFAULT 0,
  out_oc bigint NOT NULL DEFAULT 0
  );

CREATE INDEX accounting_hourly_agg_timestamp_idx ON accounting_hourly_aggregate (time_stamp);
CREATE INDEX accounting_hourly_agg_equip_wan_idx ON accounting_hourly_aggregate (equip_wan);

no of rows - 1400000

<TABLE DEFINITION ENDS>---------------------------------------------------------------------------------------------------------------------------------------------------------------

The below 2 queries run every 15 min and 1 hour respectively from tomcat node using jdbc. Hourly query uses 15 min query.
Tomcat and DB are in different node.

(1) INSERT INTO accounting_fifteenminute_aggregate 
Select equip_wan,identifier,'2014-04-16 14:00:00.0',sum(in_oc),sum(out_oc) 
from accounting where time_stamp >= '2014-04-16 13:45:00.0' and time_stamp < '2014-04-16 14:00:00.0' group by equip_wan,identifier

time taken for execution of the above query - 300 sec

EXPLAIN (ANALYZE, BUFFERS) 
                                                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on accounting_fifteenminute_aggregate  (cost=253.37..253.47 rows=4 width=89) (actual time=196833.655..196833.655 rows=0 loops=1)
   Buffers: shared hit=23941 read=4092 dirtied=2675
   ->  Subquery Scan on "*SELECT*"  (cost=253.37..253.47 rows=4 width=89) (actual time=3621.621..3763.701 rows=3072 loops=1)
         Buffers: shared hit=3494 read=93
         ->  HashAggregate  (cost=253.37..253.41 rows=4 width=41) (actual time=3621.617..3737.370 rows=3072 loops=1)
               Buffers: shared hit=3494 read=93
               ->  Index Scan using accounting_time_stamp_idx on accounting  (cost=0.00..220.56 rows=3281 width=41) (actual time=3539.890..3601.808 rows=3680 loops=1)
                     Index Cond: ((time_stamp >= '2014-04-16 13:45:00+05:30'::timestamp with time zone) AND (time_stamp < '2014-04-16 14:00:00+05:30'::timestamp with time zone))
                     Buffers: shared hit=3494 read=93
 Total runtime: 196833.781 ms
(10 rows)


(2) INSERT INTO accounting_hourly_aggregate 
Select equip_wan,identifier,'2014-04-16 14:00:00.0',sum(in_oc),sum(out_oc) 
from accounting_fifteenminute_aggregate where time_stamp > '2014-04-16 13:00:00.0' group by equip_wan,identifier

time taken for execution of the above query - 280 sec

*************************************************************************************************************************************
The below query is report query which uses the above aggregated tables

Select
queryA.wAddr,
queryA.name,
queryA.dBy,
queryA.upBy,
(queryA.upBy + queryA.dBy) as totalBy
From
(Select 
queryC.wAddr,
queryC.name,
COALESCE(queryI.dBy, 0) as dBy,
COALESCE(queryI.upBy, 0) as upBy
From
(Select 
DISTINCT ON(mac_addr_w) 
mac_addr_w as wAddr,
name
From equipment
where 
(latitude BETWEEN -90.0 AND 90.0)  AND 
(longitude BETWEEN -180.0 AND 180.0) 
) as queryC
Left Join
(Select 
equip_wan as wAddr,
SUM(in_oc) as dBy,
SUM(out_oc) as upBy
From accounting_hourly_aggregate
where time_stamp > '2014-04-13 16:00:00.0' and time_stamp <= '2014-04-14 16:00:00.0'
Group by equip_wan) as queryI
On queryC.wAddr = queryI.wAddr) as queryA
order by totalBy DESC Limit 10;

Above query execution takes - 3 min 28 sec.
So I did a manual analyze to see if any performance benefit is obtained. Analyze accounting_hourly_aggregate takes 40 sec.
After analysis same query takes 16 sec.
But 40 mins after analyzing accounting_hourly_aggregate table, the above query execution time again increases to  few minutes.
The above query is run from command line of postgres. 
Auto vacuum is by default running.

Explain of the above query

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11664.77..11664.80 rows=10 width=92) (actual time=159613.007..159613.010 rows=10 loops=1)
   Buffers: shared hit=2282 read=3528
   ->  Sort  (cost=11664.77..11689.77 rows=10000 width=92) (actual time=159613.005..159613.007 rows=10 loops=1)
         Sort Key: ((COALESCE(queryI.upBy, 0::numeric) + COALESCE(queryI.dBy, 0::numeric)))
         Sort Method: top-N heapsort  Memory: 26kB
         Buffers: shared hit=2282 read=3528
         ->  Merge Left Join  (cost=9748.22..11448.68 rows=10000 width=92) (actual time=157526.220..159607.130 rows=10000 loops=1)
               Merge Cond: ((equipment.mac_addr_w)::text = (queryI.wAddr)::text)
               Buffers: shared hit=2282 read=3528
               ->  Unique  (cost=0.00..1538.56 rows=10000 width=28) (actual time=84.291..2151.497 rows=10000 loops=1)
                     Buffers: shared hit=591 read=840
                     ->  Index Scan using equipment_mac_addr_w_idx on equipment  (cost=0.00..1499.35 rows=15684 width=28) (actual time=84.288..2145.990 rows=15684 loops=1)
                           Filter: ((latitude >= (-90.0)) AND (latitude <= 90.0) AND (longitude >= (-180.0)) AND (longitude <= 180.0))
                           Buffers: shared hit=591 read=840
               ->  Sort  (cost=9748.22..9750.20 rows=793 width=82) (actual time=157441.910..157443.710 rows=6337 loops=1)
                     Sort Key: queryI.wAddr
                     Sort Method: quicksort  Memory: 688kB
                     Buffers: shared hit=1691 read=2688
                     ->  Subquery Scan on queryI  (cost=9694.17..9710.03 rows=793 width=82) (actual time=157377.819..157381.314 rows=6337 loops=1)
                           Buffers: shared hit=1691 read=2688
                           ->  HashAggregate  (cost=9694.17..9702.10 rows=793 width=34) (actual time=157377.819..157380.154 rows=6337 loops=1)
                                 Buffers: shared hit=1691 read=2688
                                 ->  Index Scan using accounting_hourly_agg_idx on accounting_hourly_aggregate  (cost=0.00..8292.98 rows=186826 width=34) (actual time=1328.363..154164.439 rows=193717 loops=1)
                                       Index Cond: ((time_stamp > '2014-04-14 12:00:00+05:30'::timestamp with time zone) AND (time_stamp <= '2014-04-15 18:00:00+05:30'::timestamp with time zone))
                                       Buffers: shared hit=1691 read=2688
 Total runtime: 159613.100 ms
(26 rows)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Following values have been changed in postgresql.conf
shared_buffers = 2GB
work_mem = 32MB
maintenance_work_mem = 512MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

System config - 
8 gb RAM
hard disk - 300 gb
Linux 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

Postgres version
PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit

Basically all the queries are taking time, as the raw tables size increases. Will partitioning help ?
Nick Eubank | 16 Apr 03:36 2014
Picon

Workaround for working_mem max value in windows?

Hi all,


A few years ago someone said postgres windows can't set working_mem above about 2 GB (www.postgresql.org/message-id/17895.1315869622 <at> sss.pgh.pa.us -- seems to be same for maintenance_working_mem ). Im finding limit still present.

 I'm doing single user, single connection data intensive queries and would like to set a higher value on windows to better use 16gb built in ram (don't control platform, so can't jump to Linux). 

Anyone found a work around?

Thanks!

Nick
Dave Cramer | 15 Apr 17:57 2014
Picon

Testing strategies

I have a client wanting to test PostgreSQL on ZFS running Linux. Other than pg_bench are there any other benchmarks that are easy to test? One of the possible concerns is fragmentation over time. Any ideas on how to fragment the database before running pg_bench ? Also there is some concern about fragmentation of the WAL logs. I am looking at testing with and without the WAL logs on ZFS. Any other specific concerns ? Dave Cramer credativ ltd (Canada) 78 Zina St Orangeville, ON Canada. L9W 1E8 Office: +1 (905) 766-4091 Mobile: +1 (519) 939-0336 =================================== Canada: http://www.credativ.ca USA: http://www.credativ.us Germany: http://www.credativ.de Netherlands: http://www.credativ.nl UK: http://www.credativ.co.uk India: http://www.credativ.in ===================================
Mel Llaguno | 15 Apr 16:31 2014

Re: HFS+ pg_test_fsync performance



My 2 cents :

The results are not surprising, in the linux enviroment the i/o call of pg_test_fsync  are using O_DIRECT  (PG_O_DIRECT) with also the O_SYNC or O_DSYNC calls, so ,in practice, it is waiting the "answer" from the storage bypassing the cache  in sync mode, while in  the Mac OS X it is not doing so, it's only using the O_SYNC or O_DSYNC calls without O_DIRECT,  in practice, it's using the cache of filesystem , even if it is asking the sync of io calls.


Bye

Mat Dba

--------

Thanks for the explanation. Given that OSX always seems to use filesystem cache, is there a way to measure fsync performance that is equivalent to Linux? Or will the use of pg_test_fsync always be inflated under OSX? The reason I ask is that we would like to make a case with a customer that PG performance on OSX/HFS+ would be sub-optimal compared to using Linux/EXT4 (or FreeBSD/UFS2 for that matter).

Thanks, Mel
Mel Llaguno | 15 Apr 00:32 2014

HFS+ pg_test_fsync performance

I was given anecdotal information regarding HFS+ performance under OSX as
being unsuitable for production PG deployments and that pg_test_fsync
could be used to measure the relative speed versus other operating systems
(such as Linux). In my performance lab, I have a number of similarly
equipped Linux hosts (Ubuntu 12.04 64-bit LTS Server w/128Gb RAM / 2 OWC
6g Mercury Extreme SSDs / 7200rpm SATA3 HDD / 16 E5-series cores) that I
used to capture baseline Linux numbers. As we generally recommend our
customers use SSD (the s3700 recommended by PG), I wanted to perform a
comparison. On these beefy machines I ran the following tests:

SSD:

# pg_test_fsync -f ./fsync.out -s 30
30 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                        2259.652 ops/sec     443 usecs/op
        fsync                            1949.664 ops/sec     513 usecs/op
        fsync_writethrough                            n/a
        open_sync                        2245.162 ops/sec     445 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                        2161.941 ops/sec     463 usecs/op
        fsync                            1891.894 ops/sec     529 usecs/op
        fsync_writethrough                            n/a
        open_sync                        1118.826 ops/sec     894 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write        2171.558 ops/sec     460 usecs/op
         2 *  8kB open_sync writes       1126.490 ops/sec     888 usecs/op
         4 *  4kB open_sync writes        569.594 ops/sec    1756 usecs/op
         8 *  2kB open_sync writes        285.149 ops/sec    3507 usecs/op
        16 *  1kB open_sync writes        142.528 ops/sec    7016 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close              1947.557 ops/sec     513 usecs/op
        write, close, fsync              1951.082 ops/sec     513 usecs/op

Non-Sync'ed 8kB writes:
        write                           481296.909 ops/sec       2 usecs/op


HDD:

pg_test_fsync -f /tmp/fsync.out -s 30
30 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                         105.783 ops/sec    9453 usecs/op
        fsync                              27.692 ops/sec   36111 usecs/op
        fsync_writethrough                            n/a
        open_sync                         103.399 ops/sec    9671 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                         104.647 ops/sec    9556 usecs/op
        fsync                              27.223 ops/sec   36734 usecs/op
        fsync_writethrough                            n/a
        open_sync                          55.839 ops/sec   17909 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write         103.581 ops/sec    9654 usecs/op
         2 *  8kB open_sync writes         55.207 ops/sec   18113 usecs/op
         4 *  4kB open_sync writes         28.320 ops/sec   35311 usecs/op
         8 *  2kB open_sync writes         14.581 ops/sec   68582 usecs/op
        16 *  1kB open_sync writes          7.407 ops/sec  135003 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close                27.228 ops/sec   36727 usecs/op
        write, close, fsync                27.108 ops/sec   36890 usecs/op

Non-Sync'ed 8kB writes:
        write                           466108.001 ops/sec       2 usecs/op


-------

So far, so good. Local HDD vs. SSD shows a significant difference in fsync
performance. Here are the corresponding fstab entries :

/dev/mapper/cim-base
/opt/cim		ext4	defaults,noatime,nodiratime,discard	0	2 (SSD)
/dev/mapper/p--app--lin-root /               ext4    errors=remount-ro 0
    1 (HDD)

I then tried the pg_test_fsync on my OSX Mavericks machine (quad-core i7 /
Intel 520SSD / 16GB RAM) and got the following results :

# pg_test_fsync -s 30 -f ./fsync.out
30 seconds per test
Direct I/O is not supported on this platform.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                    8752.240 ops/sec     114 usecs/op
        fdatasync                        8556.469 ops/sec     117 usecs/op
        fsync                            8831.080 ops/sec     113 usecs/op
        fsync_writethrough                735.362 ops/sec    1360 usecs/op
        open_sync                        8967.000 ops/sec     112 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                    4256.906 ops/sec     235 usecs/op
        fdatasync                        7485.242 ops/sec     134 usecs/op
        fsync                            7335.658 ops/sec     136 usecs/op
        fsync_writethrough                716.530 ops/sec    1396 usecs/op
        open_sync                        4303.408 ops/sec     232 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write        7559.381 ops/sec     132 usecs/op
         2 *  8kB open_sync writes       4537.573 ops/sec     220 usecs/op
         4 *  4kB open_sync writes       2539.780 ops/sec     394 usecs/op
         8 *  2kB open_sync writes       1307.499 ops/sec     765 usecs/op
        16 *  1kB open_sync writes        659.985 ops/sec    1515 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close              9003.622 ops/sec     111 usecs/op
        write, close, fsync              8035.427 ops/sec     124 usecs/op

Non-Sync'ed 8kB writes:
        write                           271112.074 ops/sec       4 usecs/op

-------


These results were unexpected and surprising. In almost every metric (with
the exception of the Non-Sync┬╣d 8k8 writes), OSX Mavericks 10.9.2 using
HFS+ out-performed my Ubuntu servers. While the SSDs come from different
manufacturers, both use the SandForce SF-2281 controllers.

Plausible explanations of the apparent disparity in fsync performance
would be welcome.

Thanks, Mel

P.S. One more thing; I found this article which maps fsync mechanisms
versus
operating systems :
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm


This article suggests that both open_datasync and fdatasync are _not_
supported for OSX, but the pg_test_fsync results suggest otherwise.


--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Gmane