Dan Ports | 1 Mar 2011 01:03
Picon
Picon
Favicon

Re: SSI bug?

An updated patch to address this issue is attached. It fixes a couple
issues related to use of the backend-local lock table hint:

  - CheckSingleTargetForConflictsIn now correctly handles the case
    where a lock that's being held is not reflected in the local lock
    table. This fixes the assertion failure reported in this thread.

  - PredicateLockPageCombine now retains locks for the page that is
    being removed, rather than removing them. This prevents a
    potentially dangerous false-positive inconsistency where the local
    lock table believes that a lock is held, but it is actually not.

  - add some more comments documenting the times when the local lock
    table can be inconsistent with reality, as reflected in the shared
    memory table.

This patch also incorporates Kevin's changes to copy locks when
creating a new version of a tuple rather than trying to maintain a
linkage between different versions. So this is a patch that should
apply against HEAD and addresses all outstanding SSI bugs known to
Kevin or myself.

Besides the usual regression and isolation tests, I have tested this
by running DBT-2 on a 16-core machine to verify that there are no
assertion failures that only show up under concurrent access.

Dan

--

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/
(Continue reading)

Rod Taylor | 1 Mar 2011 01:18
Picon

Re: WIP: cross column correlation ...


> But it's not the same as tracking *sections of a table*.

I dunno.  I imagine if you have a "section" of a table in different
storage than other sections, you created a tablespace and moved the
partition holding that section there.  Otherwise, how do you prevent the
tuples from moving to other "sections"?  (We don't really have a concept
of "sections" of a table.)


Section could be as simple as being on the inner or outer part of a single disk, or as complicated as being on the SSD cache of a spinning disk, or in the multi-gigabyte cache on the raid card or SAN due to being consistently accessed.

Section is the wrong word. If primary key values under 10 million are consistently accessed, they will be cached even if they do get moved through the structure. Values over 10M may be fast if on the same page as the other value but probably aren't.

This is very evident when dealing with time based data in what can be a very large structure. 1% may be very hot and in memory while 99% is not.

Partitioning only helps if you can predict what will be hot in the future. Sometimes an outside source (world events) impacts what section of the structure is hot.

regards,

Rod
Simon Riggs | 1 Mar 2011 01:19
Favicon

Re: Sync Rep v17

On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
> > if the standby crashes while a transaction is waiting for replication,
> > it waits infinitely.
> 
> Will think on this.

The behaviour seems correct to me:

If allow_standalone_primary = off then you wish to wait forever (at your
request...)

If allow_standalone_primary = on then we sit and wait until we hit
client timeout, which occurs even after last standby has gone.

-- 
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joachim Wieland | 1 Mar 2011 05:27
Picon

Re: Snapshot synchronization, again...

On Mon, Feb 28, 2011 at 6:38 PM, Robert Haas <robertmhaas <at> gmail.com> wrote:
>> Remember that it's not only about saving shared memory, it's also
>> about making sure that the snapshot reflects a state of the database
>> that has actually existed at some point in the past. Furthermore, we
>> can easily invalidate a snapshot that we have published earlier by
>> deleting its checksum in shared memory as soon as the original
>> transaction commits/aborts. And for these two a checksum seems to be a
>> good fit. Saving memory then comes as a benefit and makes all those
>> happy who don't want to argue about how many slots to reserve in
>> shared memory or don't want to have another GUC for what will probably
>> be a low-usage feature.
>
> But you can do all of this with files too, can't you?  Just remove or
> truncate the file when the snapshot is no longer valid.

Sure we can, but it looked like the consensus of the first discussion
was that the through-the-client approach was more flexible. But then
again nobody is actively arguing for that anymore.

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane | 1 Mar 2011 05:29
Picon

Re: Review: Fix snapshot taking inconsistencies

Marko Tiikkaja <marko.tiikkaja <at> cs.helsinki.fi> writes:
> On 2011-02-28 9:36 PM, Tom Lane wrote:
>> OK, so the intent is that in all cases, we just advance CID and don't
>> take a new snapshot between queries that were generated (by rule
>> expansion) from a single original parsetree?  But we still take a new
>> snap between original parsetrees?  Works for me.

> Exactly.

OK, applied with corrections (I didn't think either the spi.c or
functions.c changes were quite right).

			regards, tom lane

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao | 1 Mar 2011 07:25
Picon

Re: Sync Rep v17

On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon <at> 2ndquadrant.com> wrote:
> On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
>> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
>> > if the standby crashes while a transaction is waiting for replication,
>> > it waits infinitely.
>>
>> Will think on this.
>
> The behaviour seems correct to me:
>
> If allow_standalone_primary = off then you wish to wait forever (at your
> request...)

No, I've never wished wait-forever option for now. I'd like to make
the primary work alone when there is no connected standby, for
high-availability.

> If allow_standalone_primary = on then we sit and wait until we hit
> client timeout, which occurs even after last standby has gone.

In that case, why do backends need to wait until the timeout occurs?
We can make those backends resume their transaction as soon as
the last standby has gone. No?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao | 1 Mar 2011 07:51
Picon

Re: Sync Rep v17

Thanks for update of the patch!

On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon <at> 2ndquadrant.com> wrote:
>> SyncRepRemoveFromQueue seems not to be as short-term as we can
>> use the spinlock. Instead, LW lock should be used there.

You seem to have forgotten to fix the above-mentioned issue.
A spinlock can be used only for very short-term operation like
read/write of some shared-variables. The operation on the queue
is not short, so should be protected by LWLock, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao | 1 Mar 2011 08:28
Picon

Re: Sync Rep v17

On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs <simon <at> 2ndquadrant.com> wrote:
>> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
>> replication as well as COMMIT PREPARED?
>
> PREPARE - Yes
> ROLLBACK - No
>
> Further discussion welcome

If we don't make ROLLBACK PREPARED wait for replication, we might need to
issue ROLLBACK PREPARED to new master again after failover, even if we've
already received a success indication of ROLLBACK PREPARED from old master.
This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case,
we don't need to issue that to new master again after failover.

>> What if fast shutdown is requested while RecordTransactionCommit
>> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
>> until replication has been successfully done (i.e., until at least one
>> synchronous standby has connected to the master especially if
>> allow_standalone_primary is disabled). Is this OK?
>
> A "behaviour" - important, though needs further discussion.

One of the scenarios which I'm concerned is:

1. The primary is running with allow_standalone_primary = on.
2. While some backends are waiting for replication, the user requests
fast shutdown.
3. Since the timeout expires, those backends stop waiting and return the success
    indication to the client (but not replicated to the standby).
4. Since there is no backend waiting for replication, fast shutdown completes.
5. The clusterware like pacemaker detects the death of the primary and
triggers the
    failover.
6. New primary doesn't have some transactions committed to the client, i.e.,
    transaction lost happens!!

To avoid such a transaction lost, we should prevent the primary from
returning the
success indication to the client while fast shutdown is being executed, even if
allow_standalone_primary is enabled, I think. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs | 1 Mar 2011 08:56
Favicon

Re: Sync Rep v17

On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:
> On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon <at> 2ndquadrant.com> wrote:
> > On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
> >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
> >> > if the standby crashes while a transaction is waiting for replication,
> >> > it waits infinitely.
> >>
> >> Will think on this.
> >
> > The behaviour seems correct to me:
> >
> > If allow_standalone_primary = off then you wish to wait forever (at your
> > request...)
> 
> No, I've never wished wait-forever option for now. I'd like to make
> the primary work alone when there is no connected standby, for
> high-availability.

Good news, please excuse that reference.

> > If allow_standalone_primary = on then we sit and wait until we hit
> > client timeout, which occurs even after last standby has gone.
> 
> In that case, why do backends need to wait until the timeout occurs?
> We can make those backends resume their transaction as soon as
> the last standby has gone. No?

The guarantee provided is that we will wait for up to client timeout for
the sync standby to confirm. If we stop waiting right at the point that
an "event" occurs, it breaks the whole purpose of the feature.

-- 
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs | 1 Mar 2011 09:21
Favicon

Re: Sync Rep v17

On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
> Thanks for update of the patch!
> 
> On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon <at> 2ndquadrant.com> wrote:
> >> SyncRepRemoveFromQueue seems not to be as short-term as we can
> >> use the spinlock. Instead, LW lock should be used there.
> 
> You seem to have forgotten to fix the above-mentioned issue.

Not forgotten.

> A spinlock can be used only for very short-term operation like
> read/write of some shared-variables. The operation on the queue
> is not short, so should be protected by LWLock, I think.

There's no need to sleep while holding locks and the operations are very
short in most cases. The code around it isn't trivial, but that's no
reason to use LWlocks.

LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
that in the current code. Other views welcome.

-- 
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Gmane