Tom Lane | 1 Apr 01:08 2009
Picon

Re: How to get parallel restore in PG 8.4 to work?

henk de wit <henk53602 <at> hotmail.com> writes:
> For performance reasons (obviously ;)) I'm experimenting with parallel restore in PG 8.4. I grabbed the
latest source snapshot (of today, March 30) and compiled this with zlib support. I dumped a DB from PG 8.3.5
(using maximum compression). I got this message however:
> postgres <at> mymachine:/home/henk/postgresql-8.4/bin$ time
> ./pg_restore -p 5434 -h localhost -U henk -d db_test -j 8 -Fc
> /home/henk/test-databases/dumps/db_test.custom
> pg_restore: [archiver] WARNING: archive is compressed, but this
> installation does not support compression -- no data will be available
> pg_restore: [archiver] cannot restore from compressed archive (compression
> not supported in this installation)

As far as one can tell from here, you built *without* zlib support.
This is unrelated to parallel restore as such.

			regards, tom lane

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Greg Smith | 1 Apr 01:35 2009

Re: Strange behavior: pgbench and new Linux kernels

On Tue, 31 Mar 2009, Kevin Grittner wrote:

>>>> On Thu, Apr 17, 2008 at  7:26 PM, Greg Smith wrote:
>
>> On this benchmark 2.6.25 is the worst kernel yet:
>
>> It seems I have a lot of work ahead of me here
>> to nail down and report what's going on here.
>
> I don't remember seeing a follow-up on this issue from last year.
> Are there still any particular kernels to avoid based on this?

I never got any confirmation that the patches that came out of my 
discussions with the kernel developers were ever merged.  I'm in the 
middle of a bunch of pgbench tests this week, and one of the things I 
planned to try was seeing if the behavior has changed in 2.6.28 or 2.6.29. 
I'm speaking about pgbench at the PostgreSQL East conference this weekend 
and will have an update by then (along with a new toolchain for automating 
large quantities of pgbench tests).

--
* Greg Smith gsmith <at> gregsmith.com http://www.gregsmith.com Baltimore, MD

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Mark Kirkwood | 1 Apr 09:57 2009
Picon

Re: Raid 10 chunksize

Scott Carey wrote:
>
> A little extra info here >>  md, LVM, and some other tools do not allow the
> file system to use write barriers properly.... So those are on the bad list
> for data integrity with SAS or SATA write caches without battery back-up.
> However, this is NOT an issue on the postgres data partition.  Data fsync
> still works fine, its the file system journal that might have out-of-order
> writes.  For xlogs, write barriers are not important, only fsync() not
> lying.
>
> As an additional note, ext4 uses checksums per block in the journal, so it
> is resistant to out of order writes causing trouble.  The test compared to
> here was on ext4, and most likely the speed increase is partly due to that.
>
>   

[Looks at  Stef's  config - 2x 7200 rpm SATA RAID 0]  I'm still highly 
suspicious of such a system being capable of outperforming one with the 
same number of (effective) - much faster - disks *plus* a dedicated WAL 
disk pair... unless it is being a little loose about fsync! I'm happy to 
believe ext4 is better than ext3 - but not that much!

However, its great to have so many different results to compare against!

Cheers

Mark

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
(Continue reading)

Mark Kirkwood | 1 Apr 10:11 2009
Picon

Re: Raid 10 chunksize

Scott Carey wrote:
> On 3/25/09 9:28 PM, "Mark Kirkwood" <markir <at> paradise.net.nz> wrote:
>
>   
>>
>> Rebuilt with 64K chunksize:
>>
>> transaction type: TPC-B (sort of)
>> scaling factor: 100
>> number of clients: 24
>> number of transactions per client: 12000
>> number of transactions actually processed: 288000/288000
>> tps = 866.512162 (including connections establishing)
>> tps = 866.651320 (excluding connections establishing)
>>
>>
>> So 64K looks quite a bit better. I'll endeavor to try out 256K next week
>> too.
>>     
>
> Just go all the way to 1MB, md _really_ likes 1MB chunk sizes for some
> reason.  Benchmarks right and left on google show this to be optimal.  My
> tests with md raid 0 over hardware raid 10's ended up with that being
> optimal as well.
>
> Greg's notes on aligning partitions to the chunk are key as well.
>
>   
Rebuilt with 256K chunksize:

(Continue reading)

Mahu Vasile | 1 Apr 11:10 2009
Picon

PostgreSQL

Hi
I have some problems with the PostgreSQL 8.3.6.
The client(Microsoft Access 2000) link postgresql table(via ODBC) and 
work with this. Sometimes on the client appear:
       ODBC--call failed.
       Could not send Query(connection dead)(#26);

In PostgreSQL log appear:
     could not receive data from client: Connection timed out
     unexpected EOF on client connection

In the Client (Microsoft Access 2000) log :
     STATEMENT ERROR: func=SC_execute, desc='', errnum=27, sqlstate=, 
errmsg='Could not send query to backend'

I want to mention that the problem does not appear on the PostgreSQL8.1.

We couldn't find when it happens, and why..  

postgresql.conf:
max_connections = 256
tcp_keepalives_idle = 1         # TCP_KEEPIDLE, in seconds;
                                        # 0 selects the system default
tcp_keepalives_interval = 1             # TCP_KEEPINTVL, in seconds;
                                        # 0 selects the system default
tcp_keepalives_count = 1                # TCP_KEEPCNT;
shared_buffers = 512MB
work_mem = 32MB                         # min 64kB
maintenance_work_mem = 256MB            # min 1MB
max_fsm_pages = 262145                  # min max_fsm_relations*16, 6 
(Continue reading)

henk de wit | 1 Apr 14:22 2009
Picon

Re: How to get parallel restore in PG 8.4 to work?

Hi,

> henk de wit <henk53602 <at> hotmail.com> writes:
>> For performance reasons (obviously ;)) I'm experimenting with parallel restore in PG 8.4. [...] I got this message however:
>> [...]
>> pg_restore: [archiver] WARNING: archive is compressed, but this
>> installation does not support compression -- no data will be available

> As far as one can tell from here, you built *without* zlib support.
> This is unrelated to parallel restore as such.

I see. Thanks for the confirmation. I would have sworn I built with zlib support, but obviously I did something wrong. For some reason that I can't remember now, I did omit support for readline. Could that have anything to do with it, or are those completely unrelated?

To continue testing, I imported a PG 8.3 dump in the plain format into PG 8.4, dumped this again in the custom format and imported that again into PG 8.4 using the parallel restore feature. This proved to be very beneficial. Top shows that all the cores are being used:

./pg_restore -p 5433 -h localhost -d db_test -j 8 -Fc
/ssd/tmp/test_dump.custom

top - 11:33:37 up 1 day, 18:07,  5 users,  load average: 5.63, 2.12, 0.97
Tasks: 187 total,   7 running, 180 sleeping,   0 stopped,   0 zombie
Cpu0  : 91.7%us,  8.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si, 0.0%st
Cpu1  : 90.0%us,  9.3%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi,  0.0%si, 0.0%st
Cpu2  : 81.5%us, 15.9%sy,  0.0%ni,  2.3%id,  0.0%wa,  0.0%hi,  0.3%si, 0.0%st
Cpu3  : 87.0%us, 10.3%sy,  0.0%ni,  2.3%id,  0.0%wa,  0.0%hi,  0.3%si, 0.0%st
Cpu4  : 91.4%us,  8.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si, 0.0%st
Cpu5  : 66.8%us, 16.3%sy,  0.0%ni,  4.3%id, 11.0%wa,  0.0%hi,  1.7%si, 0.0%st
Cpu6  : 76.0%us, 12.7%sy,  0.0%ni,  0.0%id, 10.7%wa,  0.0%hi,  0.7%si, 0.0%st
Cpu7  : 97.3%us,  2.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si, 0.0%st
Mem:  33021204k total, 32861900k used,   159304k free,       40k buffers
Swap:  7811064k total,     2164k used,  7808900k free, 29166332k cached


The performance numbers are quite amazing. The dump is approximately 19GB in size and the filesystem I use is xfs on Debian Lenny. Using the normal restore (with a single process) the time it takes to do a full restore is 45 minutes, when using 8 processes this drops to just 14 minutes and 23 seconds. Using 16 processes it drops further to just 11 minutes and 46 seconds.

I still have some work to do to find out why dumping in the custom format is so much slower. Unfortunately I forgot to time this exactly, but my feeling was that it was 'very slow'. I'll try to get some exact numbers though.

Kind regards,
Henk




What can you do with the new Windows Live? Find out
Tom Lane | 1 Apr 15:46 2009
Picon

Re: How to get parallel restore in PG 8.4 to work?

henk de wit <henk53602 <at> hotmail.com> writes:
> I still have some work to do to find out why dumping in the custom
> format is so much slower.

Offhand the only reason I can see for it to be much different from
plain-text output is that -Fc compresses by default.  If you don't
care about that, try -Fc -Z0.

			regards, tom lane

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Robert Haas | 1 Apr 16:23 2009
Picon

Re: PostgreSQL

On Wed, Apr 1, 2009 at 5:10 AM, Mahu Vasile <mahu <at> vrancart.ro> wrote:
> tcp_keepalives_count = 1                # TCP_KEEPCNT;

This might not be what you want.

http://www.postgresql.org/docs/8.3/static/runtime-config-connection.html

Presumably you'd like to wait more than 1 second before declaring the
connection dead...

Beyond, that it sounds like a client problem more than a PostgreSQL problem.

...Robert

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Stef Telford | 1 Apr 16:39 2009

Re: Raid 10 chunksize


Mark Kirkwood wrote:
> Scott Carey wrote:
>>
>> A little extra info here >>  md, LVM, and some other tools do not
>>  allow the file system to use write barriers properly.... So
>> those are on the bad list for data integrity with SAS or SATA
>> write caches without battery back-up. However, this is NOT an
>> issue on the postgres data partition.  Data fsync still works
>> fine, its the file system journal that might have out-of-order
>> writes.  For xlogs, write barriers are not important, only
>> fsync() not lying.
>>
>> As an additional note, ext4 uses checksums per block in the
>> journal, so it is resistant to out of order writes causing
>> trouble.  The test compared to here was on ext4, and most likely
>> the speed increase is partly due to that.
>>
>>
>
> [Looks at  Stef's  config - 2x 7200 rpm SATA RAID 0]  I'm still
> highly suspicious of such a system being capable of outperforming
> one with the same number of (effective) - much faster - disks
> *plus* a dedicated WAL disk pair... unless it is being a little
> loose about fsync! I'm happy to believe ext4 is better than ext3 -
> but not that much!
>
> However, its great to have so many different results to compare
> against!
>
> Cheers
>
> Mark
>
Hello Mark,
    For the record, this is a 'base' debian 5 install (with openVZ but
postgreSQL is running on the base hardware, not inside a container)
and I have -explicitly- enabled sync in the conf. Eg;

fsync = on                                            # turns forced
synchronization on or off
synchronous_commit = on                 # immediate fsync at commit
#wal_sync_method = fsync                # the default is the first option

    Infact, if I turn -off- sync commit, it gets about 200 -slower-
rather than faster. Curiously, I also have an intel x25-m winging it's
way here for testing/benching under postgreSQL (along with a vertex
120gb). I had one of the nice lads on the OCZ forum bench against a
30gb vertex ssd, and if you think -my- TPS was crazy.. you should have
seen his.

postgres <at> rob-desktop:~$ /usr/lib/postgresql/8.3/bin/pgbench -c 24 -t
12000 test_db
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 3662.200088 (including connections establishing)
tps = 3664.823769 (excluding connections establishing)

    (Nb; Thread here;
http://www.ocztechnologyforum.com/forum/showthread.php?t=54038 )

    Curiously, I think with SSD's there may have to be an 'off' flag
if you put the xlog onto an ssd. It seems to complain about 'too
frequent checkpoints'.

    I can't wait for -either- of the drives to arrive. I want to see
in -my- system what the speed is like for SSD's. The dataset I have to
work with is fairly small (30-40GB) so, using an 80GB ssd (even a few
raided) is possible for me. Thankfully ;)

    Regards
    Stef
(ps. I should note, running postgreSQL in a prod environment -without-
a nice UPS is never going to happen on my watch, so, turning on
write-cache (to me) seems like a no-brainer really if it makes this
kind of boost possible)
Greg Smith | 1 Apr 18:08 2009

Re: Raid 10 chunksize

On Wed, 1 Apr 2009, Stef Telford wrote:

> I have -explicitly- enabled sync in the conf...In fact, if I turn -off- 
> sync commit, it gets about 200 -slower- rather than faster.

You should take a look at 
http://www.postgresql.org/docs/8.3/static/wal-reliability.html

And check the output from "hdparm -I" as suggested there.  If turning off 
fsync doesn't improve your performance, there's almost certainly something 
wrong with your setup.  As suggested before, your drives probably have 
write caching turned on.  PostgreSQL is incapable of knowing that, and 
will happily write in an unsafe manner even if the fsync parameter is 
turned on.  There's a bunch more information on this topic at 
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm

Also:  a run to run variation in pgbench results of +/-10% TPS is normal, 
so unless you saw a consistent 200 TPS gain during multiple tests my guess 
is that changing fsync for you is doing nothing, rather than you 
suggestion that it makes things slower.

> Curiously, I think with SSD's there may have to be an 'off' flag
> if you put the xlog onto an ssd. It seems to complain about 'too
> frequent checkpoints'.

You just need to increase checkpoint_segments from the tiny default if you 
want to push any reasonable numbers of transactions/second through pgbench 
without seeing this warning.  Same thing happens with any high-performance 
disk setup, it's not specific to SSDs.

--
* Greg Smith gsmith <at> gregsmith.com http://www.gregsmith.com Baltimore, MD

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Gmane