Mikko Partio | 1 Aug 2007 06:05
Picon

Re: Re: log shipping gone wrong



On 7/29/07, Jan Wieck <JanWieck-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
On 7/24/2007 12:29 AM, Mikko Partio wrote:
>
>     What I meant is that it'd be good to get enough detail information about
>     the specific actions that lead to this problem in order to create a
>     standalone test that can reproduce it.
>
>
> Ok here comes:

This one played quite a bit hide and seek with me. But I think I found
and fixed the bug.

The problem occurred when some action (like STORE_SET) caused the
subscriber slon to perform an internal restart. During that, the current
setsync tracking in the pset structure is reinitialized. However, since
the sl_setsync table is not updated on events other than SYNC, this
would lead to the wrong (too low) old sync expected if the last event(s)
processed from that node where no SYNC events.

Thanks for the detailed example.


Great that you got it fixed! Will the patch be included in the 1.2.11 release?

Regards

MP
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Stéphane Schildknecht | 1 Aug 2007 11:44

Re: Help with Slony- please read!

Benezra, Eric a écrit :
>
> I am currently trying to use Slony on Windows XP. It is installed and
> scripts have been created and modified appropriately. When I try to
> run the Slony configuration script (slonik slonyconfigure.txt), I get
> the following error messages:
>
> slonyconfigure.txt:7 could not open file
> /usr/local/pgsql/share/xxid.v81.sql
> slonyconfigure.txt:7 ERROR: no admin conninfo for node 4017728
>
Is the file readable ?
What's more, I don't think windows can understand such a path.

My 2 cents

SAS
Christopher Browne | 1 Aug 2007 15:15

Re: Re: log shipping gone wrong

"Mikko Partio" <mpartio@...> writes:
> On 7/29/07, Jan Wieck <[[JanWieck@...]]> wrote:
>
>           On 7/24/2007 12:29 AM, Mikko Partio wrote:
>      >
>      >     What I meant is that it'd be good to get enough detail information about
>      >     the specific actions that lead to this problem in order to create a
>      >     standalone test that can reproduce it.
>      >
>      >
>      > Ok here comes:
>      
>      This one played quite a bit hide and seek with me. But I think I found
>      and fixed the bug.
>      
>      The problem occurred when some action (like STORE_SET) caused the
>      subscriber slon to perform an internal restart. During that, the current
>      setsync tracking in the pset structure is reinitialized. However, since
>      the sl_setsync table is not updated on events other than SYNC, this
>      would lead to the wrong (too low) old sync expected if the last event(s)
>      processed from that node where no SYNC events.
>      
>      Thanks for the detailed example.
>      
>
>
> Great that you got it fixed! Will the patch be included in the 1.2.11 release?
> Regards
> MP

Yes, it's in what will be 1.2.11.

I have not been able to get the log shipping test in the
tests/testlogship directory in the code base to exercise the problem.
Now, the patch is applied in CVS, so that either a checkout of the 1.2
branch, or application of the patch to a copy of source that you may
have would allow you to deploy it for testing, at least.

I'd be more than happy to make a tarball available for you, if that
would help you run a test to verify that this fixes the problem for
you.
--

-- 
output = reverse("ofni.sesabatadxunil" " <at> " "enworbbc")
http://linuxdatabases.info/info/wp.html
"Consistency requires you to be as  ignorant today as  you were a year
ago."  -- Bernard Berenson
Laurent Raufaste | 1 Aug 2007 18:30
Favicon

Slony lag times

Hi,

We are using Slony on a production environment and are very pleased by it.

Our cluster is made of 1 master, 4 slaves that needs to be replicated
fast, and 2 slaves for which the replication speed isn't a problem.

Here's our issue: In the sl_status view I notice that the st_lag_time is
always between 1 and many seconds: it goes up to 10 seconds regularly,
and approximatively one time a day, there is always a slave reaching 1
min, for example while vacuuming.

I tried playing with the folllowing options:
     -s <milliseconds>     SYNC check interval (default 10000)
     -t <milliseconds>     SYNC interval timeout (default 60000)
     -o <milliseconds>     desired subscriber SYNC processing time
     -g <num>              maximum SYNC group size (default 6)

Now on the master I have:
-s 1000 -g 50
On the fast slaves I have:
-s 1000
And on the slow slaves:
-s 10000 -g 10

I tried lowering the SYNC check interval to 500ms with no real effect,
and the master is already loaded enough anyway ;)

Is there an effective way to shorten the replication lag time ?

A Slony noob.

--

-- 
Laurent Raufaste
JFG Networks
<http://www.over-blog.com/>
aitali lahcen | 1 Aug 2007 18:38
Picon
Favicon

(no subject)

Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://gborg.postgresql.org/mailman/listinfo/slony1-general
Jeff Frost | 1 Aug 2007 19:22
Gravatar

problems with slony1-1.2.10 and postgresql-8.2.4

Hi folks,

In an effort to upgrade a cluster from slony1-1.2.6 and postgresql-8.1.8, I 
dumped the DB, upgraded to postgresql-8.2.4, comiled and installed 
slony1-1.2.10 and then tried to restore the DB on the master.  The restore 
crashed the backend every time when trying to restore our slony_test table. So 
I opted to go a different route and installed 8.1.9, moved the old data 
directory in, dropped the slony schema cascade and dumped the db.  Then I 
restored that into 8.2.4 successfully.  After upgrading the slave to 8.2.4 and 
1.2.10, I tried to subscribe the slave, but the slonik command errored out and 
left the following in the postgresql log:

Jul 29 15:48:49 testdb1 postgres[1173]: [1-1] ERROR:  could not access file
"$libdir/xxid": No such file or directory
Jul 29 15:48:49 testdb1 postgres[1173]: [1-2] STATEMENT:  load
'$libdir/xxid';

This is what slonik complained about:

<stdin>:17: PGRES_FATAL_ERROR load '$libdir/xxid'; - ERROR: incompatible 
library "/usr/lib64/pgsql/xxid.so": missing magic block

Except, I don't know why it's looking in /usr/lib64/pgsql because this is a 32 
bit CentOS 4.3 machine.  I double checked that the i686 versions of postgres 
are installed, slony was compiled from source against /usr/bin/pg_config which 
correctly outputs that libdir is /usr/lib/pgsql:
PKGLIBDIR = /usr/lib/pgsql

What am I missing?  Is there a special procedure for using Postgresql-8.2.4?

It yielded the same results with slony 1.2.8 and 1.2.9.  But, 1.2.10 is working 
fine with postgresql-8.1.9 which we reverted to.

--

-- 
Jeff Frost, Owner 	<jeff@...>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954
Mikko Partio | 1 Aug 2007 20:05
Picon

Re: Re: log shipping gone wrong



On 8/1/07, Christopher Browne <cbbrowne-swQf4SbcV9C7WVzo/KQ3Mw@public.gmane.org> wrote:

I'd be more than happy to make a tarball available for you, if that
would help you run a test to verify that this fixes the problem for
you.


Yes sure if you're willing to roll out a pre-release I can try it out.

regards

MP


_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Christopher Browne | 1 Aug 2007 21:43

Testing on 1.2.11 - So Far So Good

I have run what-soon-should-be-1.2.11 against PostgreSQL 7.4, 8.0, 8.1, 
8.2, and a (sadly, aging!) 8.3 (probably a few weeks old :-().  All 
looking good, so far.

I'll see about getting a tarball put together to generate a "pre-1.2.11" 
so that Mikko Partio and others can do any further testing that is 
desired before the release.
Christopher Browne | 1 Aug 2007 21:54

Re: Slony lag times

Laurent Raufaste wrote:
> Hi,
>
> We are using Slony on a production environment and are very pleased by 
> it.
>
> Our cluster is made of 1 master, 4 slaves that needs to be replicated
> fast, and 2 slaves for which the replication speed isn't a problem.
>
> Here's our issue: In the sl_status view I notice that the st_lag_time is
> always between 1 and many seconds: it goes up to 10 seconds regularly,
> and approximatively one time a day, there is always a slave reaching 1
> min, for example while vacuuming.
>
If the problem is essentially that the master is overloaded, then there 
isn't any configuration change likely to help.  There is no guaranteeing 
that subscribers will be Right Up To Date.
> I tried playing with the folllowing options:
>     -s <milliseconds>     SYNC check interval (default 10000)
>     -t <milliseconds>     SYNC interval timeout (default 60000)
>     -o <milliseconds>     desired subscriber SYNC processing time
>     -g <num>              maximum SYNC group size (default 6)
>
> Now on the master I have:
> -s 1000 -g 50
> On the fast slaves I have:
> -s 1000
> And on the slow slaves:
> -s 10000 -g 10
>
> I tried lowering the SYNC check interval to 500ms with no real effect,
> and the master is already loaded enough anyway ;)
There are seeming misconceptions in how you're configuring it...

- On the origin node, the "-g" option is pretty much irrelevant.  -g 
affects how *subscribers* group together SYNCs into groups; since the 
origin does not apply any SYNCs coming from subscribers, it is 
irrelevant to do any grouping there.

- On the other hand, the main node where the "-s" (and -t) options have 
meaningful effect is on an origin node, as that is the option which 
controls how often SYNCs are generated.  (SYNCs get generated on 
non-origin nodes, but those SYNCs don't lead to any replication work 
being done, so they're very uninteresting.)  Thus, setting "-s 1000" 
versus "-s 10000" on subscriber nodes is pretty much irrelevant.
> Is there an effective way to shorten the replication lag time ?
>
> A Slony noob.
>
Speed is an "emergent property," falling from how much work is being 
thrown at the system (e.g. - what are you doing to overload it) and how 
much hardware there is to do replication work.

Tuning the DBMS, tuning OS, tuning hardware, and such, will have some 
effect.  In the long run, though, the main way is to get faster network 
and disk hardware...
David Rees | 2 Aug 2007 02:23
Picon

Re: problems with slony1-1.2.10 and postgresql-8.2.4

On 8/1/07, Jeff Frost <jeff@...> wrote:
> <stdin>:17: PGRES_FATAL_ERROR load '$libdir/xxid'; - ERROR: incompatible
> library "/usr/lib64/pgsql/xxid.so": missing magic block
>
> Except, I don't know why it's looking in /usr/lib64/pgsql because this is a 32
> bit CentOS 4.3 machine.  I double checked that the i686 versions of postgres
> are installed, slony was compiled from source against /usr/bin/pg_config which
> correctly outputs that libdir is /usr/lib/pgsql:
> PKGLIBDIR = /usr/lib/pgsql
>
> What am I missing?  Is there a special procedure for using Postgresql-8.2.4?
>
> It yielded the same results with slony 1.2.8 and 1.2.9.  But, 1.2.10 is working
> fine with postgresql-8.1.9 which we reverted to.

Weird. I haven't seen that before, but I haven't been using slony for
that long, either. I have multiple servers running a combination of
CentOS 4.5 (32bit & 64bit), CentOS 5 and Fedora Core 6 using
PostgreSQL 8.2.4 and Slony 1.2.10.

I've also upgraded PostgreSQL from a previous 8.2.x version as well as
Slony from a previous 1.2.x version on them without any issues.

When you upgrade, are you recompiling/reinstalling both Postgres and
also Slony against the upgraded Postgres? I suspect that you may be
missing the Slony modules in your Postgres install.

When I upgrade I put Postgres / Slony into a unique directory for each
build (for example /usr/local/pgsql-8.2.4-2007080101 and
/usr/local/slony-1.2.10-2007080101) and then symlink that folder to
the same folder name without the version so I don't have to bother
updating paths and startup scripts.

When updating a minor version of pgsql I haven't had to
recompile/reinstall slony as long as I remember to copy the slony
modules over to the new pgsql directory.

I also notice that you're not fully updated with CentOS, is there a
reason you're not fully updated to CentOS 4.5? I doubt this has
anything to do with your issue, but it can't hurt.

-Dave

Gmane