Re: Guidance Requested - Bulk Inserting + Queries
2011-12-01 02:00:56 GMT
In a *small* test (1-2 orders of magnitude smaller than some potential customer environments), the cumulative size of the daily indexes is 3.6 GB and that's for only about half of the test.
We're talking 4 different daily partitioned tables with each table having 1 - 6 indexes (yes, a lot!).
I'll post another update when I have it.
On 11/30/2011 10:17 AM, Leonardo Francalanci wrote:
>> We now found (thanks Andres and Snow-Man in #postgresql) that in our
>> tests, after the indexes get too large performance drops signficantly
>> and our system limps forward due to disk reads (presumably for the
>> indexes). If we remove the indexes, performance for our entire sample
>> test is great and everything is written to postgresql very quickly.
> It's usually the fact that the data you index is "random" as opposed to,
> say, an always incremented value (could be a timestamp, or a sequence)
> that leads to insert problems with btrees.
>> My question is, what possible routes can I take where we can have both
>> fast inserts (with indexes removed until the end of the day), but still
>> allow a user to query against today's data? Is this even possible? One
>> idea would be possibly have hourly tables for today and as soon as we
>> can try to re-add indexes.
> Yep, that's the only way I've found: use smaller partitions. That leads
> to slower reads (due to the fact that you have to visit more indexes to
> read the same amount of data). But you'll get faster writes.
>> Another possible solution might be to stream
>> the data to another "reader" postgres instance that has indexes,
>> although I'm not very versed in replication.
> I don't think you can do that.
> Another option that you have is to use ssd instead of HD for the indexes
> only (that is, having the indexes in a separate tablespace of ssds). The
> problem is that your disks usually can't keep up with the number of
> random writes it takes to update N "random values" btrees; ssd might help.
> Can you post some numbers, such as # of indexes, # of rows you're trying
> to insert per hour etc etc?
http://getcarbonblack.com/ | <at> getcarbonblack