Re: WAL Optimisation - configuration and usage
Simon Riggs <simon <at> 2ndquadrant.com>
2004-03-02 00:27:51 GMT
>Rob Fielding wrote:
> My focus today has been on WAL - I've not looked at WAL before. By
> increasing the settings thus :
> wal_buffers = 64 # need to determin WAL usage
> wal_files = 64 # range 0-64
> wal_sync_method = fsync # the default varies across platforms:
> wal_debug = 0 # range 0-16
> # hopefully this should see less LogFlushes per LogInsert - use more
> commit_delay = 10000 # range 0-100000
> commit_siblings = 2 # range 1-1000
> checkpoint_segments = 16 # in logfile segments (16MB each), min 1
> checkpoint_timeout = 300 # in seconds, range 30-3600
> fsync = true
> But I think we can get more out of this as the above setting were
> from thin air and my concern here is being able to determin WAL file
> usage and if the system is caught out on the other extreme that we're
> not commiting fast enough. Currently I've read that WAL files
> be more than 2*checkpoint_segments+1 however my pg_xlog directory
> contains 74 files. This suggests I'm using more logfiles than I
> Also I'm not sure what wal_buffers really should be set to.
As Richard Huxton says, we're all learning...I'm looking at WAL logic
for other reasons right now...
This is based upon my reading of the code; I think the manual contains
at least one confusion that has not assisted your understanding (or
The WAL files limit of 2*checkpoint_segments+1 refers to the number of
files allocated-in-advance of the current log, not the total number of
files in use. pg uses a cycle of logs, reusing older ones when all the
transactions in those log files have been checkpointed. The limit is set
to allow checkpoint to release segments and have them all be reused at
once. Pg stores them up for use again later when workload hots up again.
If it cannot recycle a file because there is a still-current txn on the
end of the cycle, then it will allocate a new file and use this instead,
but still keeping everything in a cycle. Thus if transactions are
particularly long running, then the number of files in the cycle will
grow. So overall, normal behaviour so far. I don't think there's
anything to worry about in having that many files in your xlog cycle.
That behaviour is usually seen with occasional long running txns. When a
long running transaction is over, pg will try to reduce the number of
files in the cycle until its back to target.
You seem to be reusing one file in the cycle every 10 mins - this is
happening as the result of a checkpoint timeout - "kinda automated" as
you say. [A checkpoint is the only time you can get the messages you're
getting] At one file per checkpoint, it will take 16*2+1=33
checkpoints*10 mins = 5 hours before it hits the advance allocation file
limit and then starts to reduce number of files. That's why they appear
to stay constant...
If you want to check whether this is correct, manually issue a number of
CHECKPOINT statements. The messages should change from "recycled" to
"removing" transaction log file once you've got to 33 checkpoints - the
number of WAL log files should start to go down also? If so, then
there's nothing too strange going on, just pg being a little slow in
reducing the number of wal log files.
So, it seems that you are running occasional very long transactions.
During that period you run up to 60-80 wal files. That's just on the
edge of your wal_buffers limit, which means you start to write wal
quicker than you'd like past that point. Your checkpoint_timeout is 300
seconds, but a checkpoint will also be called every checkpoint_segments,
or currently every 16 wal files. Since you go as high as 60-80 then you
are checkpointing 4-5 times during the heavy transaction period -
assuming it's all one block of work. In the end, each checkpoint is
causing a huge I/O storm, during which not much work happens.
I would suggest that you reduce the effect of checkpointing by either:
- re-write app to do scan deletes in smaller chunks in quieter periods
- increase checkpoint_segments to 128, though this may effect your
You can of course only do so much with the memory available to you. If
you increase one allocation of memory, you may have to reduce another
parameter and that may be counter productive.
[An alternative view is that you should go for more frequent, not less
frequent checkpoints in this situation, smoothing out the effect of the
checkpoints, rather than trying to avoid them at all. On the other hand,
that approach also increases total WAL log volume, which means you'll
make poor use of I/O and memory buffering. I'd stay high.]
However, I'm not sure
- why checkpoint interval of 300 secs causes them to happen every 10
mins in quieter periods; is that an occaisional update occurring?
- why checkpoint only releases single Wal file each time - but that
maybe me just reading the code incorrectly.
Please set WAL_DEBUG to 1 so we can see a bit more info: thanks.
> Can I get any feedback on this ? How to look into pg's WAL usage would
> be what I'm looking for. BTW this is an old install I'm afraid 7.2.2 -
> it's been impossible to upgrade up until now because it's been too
> I have moved the pg_xlog onto the root SCSI disk - it doesn't appear
> have made a huge difference but it could be on the same cable.
My advice is don't touch WAL_SYNC_METHOD...
I **think** the WAL behaviour is still the same in 7.4.1, so no rush to
upgrade on that account - unless you're using temporary tables....
Best Regards, Simon Riggs
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo <at> postgresql.org)