Alvaro Herrera | 1 May 05:08 2010

Re: autovacuum strategy / parameters

Josh Berkus escribió:

> #autovacuum_vacuum_scale_factor = 0.2
> 
> This is set because in my experience, 20% bloat is about the level at
> which bloat starts affecting performance; thus, we want to vacuum at
> that level but not sooner.  This does mean that very large tables which
> never have more than 10% updates/deletes don't get vacuumed at all until
> freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
> you don't *want* to vacuum a billion-row table which has only 100,000
> updates.

Hmm, now that we have partial vacuum, perhaps we should revisit this.

> It would be worth doing a DBT2/DBT5 test run with different autovac
> settings post-8.4 so see if we should specifically change the vacuum
> threshold.

Right.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

(Continue reading)

Cédric Villemain | 1 May 12:52 2010
Picon

Re: Optimization idea

2010/4/28 Robert Haas <robertmhaas <at> gmail.com>:
> On Mon, Apr 26, 2010 at 5:33 AM, Cédric Villemain
> <cedric.villemain.debian <at> gmail.com> wrote:
>> In the first query, the planner doesn't use the information of the 2,3,4.
>> It just does a : I'll bet I'll have 2 rows in t1 (I think it should
>> say 3, but it doesn't)
>> So it divide the estimated number of rows in the t2 table by 5
>> (different values) and multiply by 2 (rows) : 40040.
>
> I think it's doing something more complicated.  See scalararraysel().
>
>> In the second query the planner use a different behavior : it did
>> expand the value of t1.t to t2.t for each join relation and find a
>> costless plan. (than the one using seqscan on t2)
>
> I think the problem here is one we've discussed before: if the query
> planner knows that something is true of x (like, say, x =
> ANY('{2,3,4}')) and it also knows that x = y, it doesn't infer that
> the same thing holds of y (i.e. y = ANY('{2,3,4}') unless the thing
> that is known to be true of x is that x is equal to some constant.
> Tom doesn't think it would be worth the additional CPU time that it
> would take to make these sorts of deductions.  I'm not sure I believe
> that, but I haven't tried to write the code, either.

Relative to this too :
http://archives.postgresql.org/pgsql-general/2010-05/msg00009.php  ?

>
> ...Robert
>
(Continue reading)

Robert Haas | 1 May 13:39 2010
Picon

Re: autovacuum strategy / parameters

On Fri, Apr 30, 2010 at 6:50 PM, Josh Berkus <josh <at> agliodbs.com> wrote:
> Which is the opposite of my experience; currently we have several
> clients who have issues which required more-frequent analyzes on
> specific tables.

That's all fine, but probably not too relevant to the original
complaint - the OP backed off the default settings by several orders
of magnitude, which might very well cause a problem with both VACUUM
and ANALYZE.

I don't have a stake in the ground on what the right settings are, but
I think it's fair to say that if you vacuum OR analyze much less
frequently than what we recommend my default, it might break.

...Robert

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Cédric Villemain | 1 May 14:00 2010
Picon

Re: Optimization idea

2010/5/1 Cédric Villemain <cedric.villemain.debian <at> gmail.com>:
> 2010/4/28 Robert Haas <robertmhaas <at> gmail.com>:
>> On Mon, Apr 26, 2010 at 5:33 AM, Cédric Villemain
>> <cedric.villemain.debian <at> gmail.com> wrote:
>>> In the first query, the planner doesn't use the information of the 2,3,4.
>>> It just does a : I'll bet I'll have 2 rows in t1 (I think it should
>>> say 3, but it doesn't)
>>> So it divide the estimated number of rows in the t2 table by 5
>>> (different values) and multiply by 2 (rows) : 40040.
>>
>> I think it's doing something more complicated.  See scalararraysel().
>>
>>> In the second query the planner use a different behavior : it did
>>> expand the value of t1.t to t2.t for each join relation and find a
>>> costless plan. (than the one using seqscan on t2)
>>
>> I think the problem here is one we've discussed before: if the query
>> planner knows that something is true of x (like, say, x =
>> ANY('{2,3,4}')) and it also knows that x = y, it doesn't infer that
>> the same thing holds of y (i.e. y = ANY('{2,3,4}') unless the thing
>> that is known to be true of x is that x is equal to some constant.
>> Tom doesn't think it would be worth the additional CPU time that it
>> would take to make these sorts of deductions.  I'm not sure I believe
>> that, but I haven't tried to write the code, either.
>
> Relative to this too :
> http://archives.postgresql.org/pgsql-general/2010-05/msg00009.php  ?

not, sorry ,misread about prepared statement in the other thread ...

(Continue reading)

Scott Marlowe | 1 May 18:08 2010
Picon

Re: autovacuum strategy / parameters

On Wed, Apr 28, 2010 at 8:20 AM, Thomas Kellerer <spam_eater <at> gmx.net> wrote:
> Rick, 22.04.2010 22:42:
>>
>> So, in a large table, the scale_factor is the dominant term. In a
>> small table, the threshold is the dominant term. But both are taken into
>> account.
>>
>> The default values are set for small tables; it is not being run for
>> large tables.
>
> With 8.4 you can adjust the autovacuum settings per table...

You can as well with 8.3, but it's not made by alter table but by
pg_autovacuum table entries.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scott Marlowe | 1 May 18:13 2010
Picon

Re: autovacuum strategy / parameters

On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh <at> agliodbs.com> wrote:
> Which is the opposite of my experience; currently we have several
> clients who have issues which required more-frequent analyzes on
> specific tables.   Before 8.4, vacuuming more frequently, especially on
> large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
> with 8.4 it's not something you want to increase without thinking about
> the tradeoff

Actually I would think that statement would be be that before 8.3
vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
had to do with moving FSM to disk, making seldom vacuumed tables
easier to keep track of, and making autovac work better in the
presence of long running transactions.  The ability to tune IO load
etc was basically unchanged in 8.4.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Greg Smith | 1 May 19:11 2010

Re: autovacuum strategy / parameters

Robert Haas wrote:
> I don't have a stake in the ground on what the right settings are, but
> I think it's fair to say that if you vacuum OR analyze much less
> frequently than what we recommend my default, it might break.
>   

I think the default settings are essentially minimum recommended 
frequencies.  They aren't too terrible for the giant data warehouse case 
Josh was suggesting they came from--waiting until there's 20% worth of 
dead stuff before kicking off an intensive vacuum is OK when vacuum is 
expensive and you're mostly running big queries anyway.  And for smaller 
tables, the threshold helps it kick in a little earlier.  It's unlikely 
anyone wants to *increase* those, so that autovacuum runs even less; out 
of the box it's not tuned to run very often at all.

If anything, I'd expect people to want to increase how often it runs, 
for tables where much less than 20% dead is a problem.  The most common 
situation I've seen where that's the case is when you have a hotspot of 
heavily updated rows in a large table, and this may match some of the 
situations that Robert was alluding to seeing.  Let's say you have a big 
table where 0.5% of the users each update their respective records 
heavily, averaging 30 times each.  That's only going to result in 15% 
dead rows, so no autovacuum.  But latency for those users will suffer 
greatly, because they might have to do lots of seeking around to get 
their little slice of the data.

--

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg <at> 2ndQuadrant.com   www.2ndQuadrant.us
(Continue reading)

Tom Lane | 1 May 19:25 2010
Picon

Re: autovacuum strategy / parameters

Greg Smith <greg <at> 2ndquadrant.com> writes:
> If anything, I'd expect people to want to increase how often it runs, 
> for tables where much less than 20% dead is a problem.  The most common 
> situation I've seen where that's the case is when you have a hotspot of 
> heavily updated rows in a large table, and this may match some of the 
> situations that Robert was alluding to seeing.  Let's say you have a big 
> table where 0.5% of the users each update their respective records 
> heavily, averaging 30 times each.  That's only going to result in 15% 
> dead rows, so no autovacuum.  But latency for those users will suffer 
> greatly, because they might have to do lots of seeking around to get 
> their little slice of the data.

With a little luck, HOT will alleviate that case, since HOT updates can
be reclaimed without running vacuum per se.  I agree there's a risk
there though.

Now that partial vacuum is available, it'd be a real good thing to
revisit these numbers.

			regards, tom lane

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Robert Haas | 1 May 21:08 2010
Picon

Re: autovacuum strategy / parameters

On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe <scott.marlowe <at> gmail.com> wrote:
> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh <at> agliodbs.com> wrote:
>> Which is the opposite of my experience; currently we have several
>> clients who have issues which required more-frequent analyzes on
>> specific tables.   Before 8.4, vacuuming more frequently, especially on
>> large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
>> with 8.4 it's not something you want to increase without thinking about
>> the tradeoff
>
> Actually I would think that statement would be be that before 8.3
> vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
> had to do with moving FSM to disk, making seldom vacuumed tables
> easier to keep track of, and making autovac work better in the
> presence of long running transactions.  The ability to tune IO load
> etc was basically unchanged in 8.4.

What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

...Robert

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scott Marlowe | 1 May 21:17 2010
Picon

Re: autovacuum strategy / parameters

On Sat, May 1, 2010 at 1:08 PM, Robert Haas <robertmhaas <at> gmail.com> wrote:
> On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe <scott.marlowe <at> gmail.com> wrote:
>> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh <at> agliodbs.com> wrote:
>>> Which is the opposite of my experience; currently we have several
>>> clients who have issues which required more-frequent analyzes on
>>> specific tables.   Before 8.4, vacuuming more frequently, especially on
>>> large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
>>> with 8.4 it's not something you want to increase without thinking about
>>> the tradeoff
>>
>> Actually I would think that statement would be be that before 8.3
>> vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
>> had to do with moving FSM to disk, making seldom vacuumed tables
>> easier to keep track of, and making autovac work better in the
>> presence of long running transactions.  The ability to tune IO load
>> etc was basically unchanged in 8.4.
>
> What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

That really only has an effect no tables that aren't updated very
often.  Unless you've got a whole bunch of those, it's not that big of
a deal.

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Gmane