Benjamin Johnson | 1 Dec 03:00 2011

Re: Guidance Requested - Bulk Inserting + Queries

We're trying to split the current day into hourly tables so that the size of the indexes that are popular is much lower and therefore we can support more rows across the day.  We also are using numerics where we could be using bigints, so we're going to also work on that to see how much smaller we can get it.  Once a daily table is not "today", we will remove duplicates, so we can combine that step with rolling up the hourly tables into one daily table.

In a *small* test (1-2 orders of magnitude smaller than some potential customer environments), the cumulative size of the daily indexes is 3.6 GB and that's for only about half of the test.

We're talking 4 different daily partitioned tables with each table having 1 - 6 indexes (yes, a lot!).

I'll post another update when I have it.

Thanks Leonardo.

On 11/30/2011 10:17 AM, Leonardo Francalanci wrote:
>> We now found (thanks Andres and Snow-Man in #postgresql) that in our
>> tests, after the indexes get too large performance drops signficantly
>> and our system limps forward due to disk reads (presumably for the
>> indexes). If we remove the indexes, performance for our entire sample
>> test is great and everything is written to postgresql very quickly.
> It's usually the fact that the data you index is "random" as opposed to,
> say, an always incremented value (could be a timestamp, or a sequence)
> that leads to insert problems with btrees.
>> My question is, what possible routes can I take where we can have both
>> fast inserts (with indexes removed until the end of the day), but still
>> allow a user to query against today's data? Is this even possible? One
>> idea would be possibly have hourly tables for today and as soon as we
>> can try to re-add indexes.
> Yep, that's the only way I've found: use smaller partitions. That leads
> to slower reads (due to the fact that you have to visit more indexes to
> read the same amount of data). But you'll get faster writes.
>
>> Another possible solution might be to stream
>> the data to another "reader" postgres instance that has indexes,
>> although I'm not very versed in replication.
> I don't think you can do that.
> Another option that you have is to use ssd instead of HD for the indexes
> only (that is, having the indexes in a separate tablespace of ssds). The
> problem is that your disks usually can't keep up with the number of
> random writes it takes to update N "random values" btrees; ssd might help.
> Can you post some numbers, such as # of indexes, # of rows you're trying
> to insert per hour etc etc?
>
>


--
Benjamin Johnson
http://getcarbonblack.com/ | <at> getcarbonblack
cell: 312.933.3612

Jesper Krogh | 1 Dec 07:11 2011

Re: Problems with FTS

On 2011-11-30 21:58, Robert Haas wrote:
> The row-count estimates look reasonably accurate, so there's some
> other problem here.  What do you have random_page_cost, seq_page_cost,
> and effective_cache_size set to?  You might try "SET
> random_page_cost=2" or even "SET random_page_cost=0.5; SET
> seq_page_cost=0.3" and see if those settings help
I may be seing ghosts here, since I've encountered
the same problem.  But the Query-planner does not
take toast into account, so a Sequential Scan + filter
only cost what it takes to scan the main table, but fts-fields
are typically large enough to be toasted so the cost should
be main+toast (amount of pages) + filtering cost.

I posted about it yesterday:

http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

If above problem is on <9.1 a patch to proper account of gin-estimates
have been added to 9.1 which also may benefit the planning:
http://www.postgresql.org/docs/9.1/static/release-9-1.html

     Improve GIN index scan cost estimation (Teodor Sigaev)

Jesper
-- 
Jesper

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Jeff Janes | 1 Dec 16:06 2011
Picon

Re: Guidance Requested - Bulk Inserting + Queries

On Wed, Nov 30, 2011 at 7:27 AM, Benjamin Johnson
<benjamin.johnson <at> getcarbonblack.com> wrote:
> Experts,
>
> Quick Summary: data can now be inserted very quickly via COPY + removing
> indexes, but is there a design or some tricks to still allow someone to
> query while the partition is still active and 'hot' ?
>
> - Postgres 9.1
> - Windows 7 (64-bit) , although this is just for the current test and
> could vary depending on situation
> - We have 4 main tables with daily partitions

How long are the daily partitions kept for?

> - Each table/partition has multiple indexes on it
> - Streaming logs from client machines into our server app which
> processes the logs and tries to shove all that data into these daily
> partitions as fast as it can.

Why shove it in as fast as you can?  If you want to both read and
write at the same time, then focusing first only on writing and
worrying about reading as an after thought seems like the wrong thing
to do.

> - Using COPY and removed original primary key unique constraints to try
> to get it to be as fast as possible (some duplicates are possible)
> - Will remove duplicates in a later step (disregard for this post)
>
> We now found (thanks Andres and Snow-Man in #postgresql) that in our
> tests, after the indexes get too large performance drops signficantly
> and our system limps forward due to  disk reads (presumably for the
> indexes).

How many hours worth of data can be loaded into the new partition
before the performance knee hits?

After the knee, how does the random disk read activity you see compare
to the maximum random disk reads your IO system can support?  How many
COPYs were you doing at the same time?

During this test, was there background select activity going on, or
was the system only used for COPY?

> If we remove the indexes, performance for our entire sample
> test is great and everything is written to postgresql very quickly.
> This allows us to shove lots and lots of data in (for production
> possibly 100 GB or a TB per day!)

How much do you need to shove in per day?  If you need to insert it,
and index it, and run queries, and deal with maintenance of the older
partitions, then you will need a lot of spare capacity, relative to
just inserting, to do all of those things.  Do you have windows where
there is less insert activity in which other things can get done?

Cheers,

Jeff

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Josh Berkus | 1 Dec 19:01 2011

Re: vacuum internals and performance affect

MirrorX,

> so when a transaction is still open from a while back (according to the
> transactionID), no 'new dead' tuples can be marked as re-usable space for
> new rows, right? by 'new dead' i mean that for example there is a
> transaction running from 10.00am(with a specific transactionID). when i
> delete rows at 11.00am these are the ones i am referring to.

With the understanding that what we're actually checking is snapshots
(which are not completely linear) and not timestamps, yes, that's a good
simplification for what happens.

> but, for these rows, the 'deleted' ones. does vacuum do anything at all at
> that time? and if so, what is it? thx in advance

No, it does nothing.  What would it do?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Robert Haas | 1 Dec 19:18 2011
Picon

Re: Autovacuum Issue

On Wed, Nov 23, 2011 at 12:55 AM, J Ramesh Kumar <rameshj1977 <at> gmail.com> wrote:
> Why the autovacuum is running even though, I disabled ? Am I miss anything ?

As Raghavendra says, anti-wraparound vacuum will always kick in to
prevent a database shutdown.

> And also please share your views on my decision about disable autovacuum for
> my application. I am planning to run vacuum command daily on that small
> table which has frequent updates.

Sounds like a bad plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

MirrorX | 1 Dec 20:23 2011
Picon

Re: vacuum internals and performance affect

from what i ve read and have i ve seen in practice, i expected it to do
nothing at all. i just wanted to be absolutely sure and that's why i asked
here. 
thank you very much for the clarification

--
View this message in context: http://postgresql.1045698.n5.nabble.com/vacuum-internals-and-performance-affect-tp5033043p5039677.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Mark Kirkwood | 1 Dec 23:35 2011
Picon

Re: Autovacuum Issue

On 02/12/11 07:18, Robert Haas wrote:
>
> And also please share your views on my decision about disable autovacuum for
> my application. I am planning to run vacuum command daily on that small
> table which has frequent updates.
> Sounds like a bad plan.
>

If the table has frequent updates vacuuming once a day will not control 
space bloat from dead rows... so your small table's storage will become 
a very large (even though there are only a few undeleted rows), and 
performance will become terrible.

I would suggest tuning autovacuum to wakeup more frequently (c.f 
autovacuum_naptime parameter), so your small table stays small.

Also you didn't mention what version of Postgres you are running. In 8.4 
and later vacuum (hence autovacuum) is much smarter about finding dead 
rows to clean up, and should have less impact. You can also control the 
load autovacuum puts on your system (c.f autovacuum_vacuum_cost_delay 
parameter).

regards

Mark

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scott Marlowe | 2 Dec 04:02 2011
Picon

Re: Autovacuum Issue

On Tue, Nov 22, 2011 at 10:55 PM, J Ramesh Kumar <rameshj1977 <at> gmail.com> wrote:
> But the autovacuum is running frequently and it impact the performance of my
> system(high CPU). You can see the autovacuum in the pg_stat_activity.

Could you show us the system metrics that led you to believe it was
high CPU usage?  Sometimes people misinterpret the numbers from
utilities like top, iostat, or vmstat, so I'd rather see them myself
if you got them.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Thiago Godoi | 2 Dec 16:55 2011
Picon

Intersect/Union X AND/OR

Hi all,

I found this presentation from B.  Momjian:

http://momjian.us/main/writings/pgsql/performance.pdf

I'm interested in what he said about " Intersect/Union X AND/OR " , Can I find a transcription or a video of this presentation? Can anyone explain it to me?

Thanks,

Thiago Godoi


 

Bruce Momjian | 2 Dec 20:49 2011
Picon

Re: Intersect/Union X AND/OR

Thiago Godoi wrote:
> Hi all,
> 
> I found this presentation from B.  Momjian:
> 
> http://momjian.us/main/writings/pgsql/performance.pdf
> 
> I'm interested in what he said about " Intersect/Union X AND/OR " , Can I
> find a transcription or a video of this presentation? Can anyone explain it
> to me?

Well, there is a recording of the webcast on the EnterpriseDB web site,
but I am afraid they only allow viewing of 3+ hour webcasts by
EnterpriseDB customers.

The idea is that a query that uses an OR can be rewritten as two SELECTs
with a UNION between them.  I have seen rare cases where this is a win,
so I mentioned it in that talk.  Intersection is similarly possible for
AND in WHERE clauses.

-- 
  Bruce Momjian  <bruce <at> momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Gmane