Samir Parikh | 17 Apr 03:06 2014

Slony setup error

I am getting following error while trying to set up replication ? Can somebody help ?


./slonik_subscribe_set 1 2 | ./slonik

<stdin>:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM "_replication".sl_event , "_replication".sl_node  where ev_origin="_replication".getLocalNodeId('_replication')  AND ev_type <> 'SYNC'  AND sl_node.no_id= ev_origin - ERROR:  function _replication.getlocalnodeid(unknown) does not exist

LINE 1: ...l_event , "_replication".sl_node  where ev_origin="_replicat...


HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

error: unable to query event history on node 2

waiting for events  (2,221473472232) only at (2,0) to be confirmed on node 1


Thanks in advance,


Slony1-general mailing list
Christopher Browne | 15 Apr 16:57 2014

Administrivia: DMARC, Yahoo

I understand that Yahoo has implemented a DMARC policy that makes third-party servers reject mailing list traffic using Yahoo-based addresses as forged.

Here is an announcement of a "block Yahoo users" policy just implemented by another free software project ( which seems to sum the issue up briefly but accurately.

FYI, we do have a number of Yahoo users on the Slony lists, and their receiving mail from the lists is not a problem.  Unfortunately, posting to these lists will cause problems, so I imagine we'll need to do a similar blocking of Yahoo addresses.  

Yahoo users that have an opinion should send comments directly to me off-list, as, due to the DMARC policy, forwarding of their opinions via the lists probably won't work out :-(.

Doubtless we'll have a more definitive declaration soon.  (I observe that Jan Wieck has moved his mail to a new domain over this :-) )
Slony1-general mailing list
Sebastian Pawłowski | 14 Apr 14:37 2014

slony 2.2.2 - execute script and SET ID


Slony 2.2.2 no longer accepts SET ID as a valid option, what if I’m replicating sets with different tables and some set is not being replicated to some nodes? What if I want to make some DDL changes to these tables? In the earlier version DDL changes was replicated only to proper nodes, and now to all nodes. 

I know I may make DDL changes directly, but with execute script was easier, is there any better way for making ddl changes in such situations?


Slony1-general mailing list
Jan Wieck | 10 Apr 16:00 2014

Yahoo DMARC test

Let's see


Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
Steve Singer | 9 Apr 02:27 2014

Slony 2.0.8 released

The slony team is announcing the release of Slony 2.0.8.

Slony 2.0.8 includes the following changes from Slony 2.0.7

- Bug 230 - log_timestamps was always treated as true on some platforms(AIX)

- Include additional C headers required for Postgresql 9.2(master)

- Bug 233 - Fix segfault when subscribing to a set that does not exist.

- Bug 236 :: Fix default formatting of timestamp in logs

- Bug 260 :: Fix issue with FAILOVER when failing over an origin with
           multiple sets.
- Bug 315 :: Fixes to the compile time include directories

The Slony team does not plan any additional releases of Slony 2.0.x. 
Users are encouraged to upgrade to the slony 2.2.x version of Slony. 
Users are also reminded that Slony 2.0.x is not recommended for 
PostgreSQL versions 9.1 or higher.

You can download Slony 2.0.8 at the following links 
Jeff Frost | 2 Apr 22:05 2014

help tuning to reduce replication lag

First, I have a question.  The docs here:

indicate that 0.978 seconds until close cursor means that processing took 0.978 seconds against the provider.

Does that mean this is how long it took to iterate over the entire cursor or that it took this long to iterate
over the cursor and perform the actual inserts/updates/deletes on the local node?  I'm guessing it's the
latter, but I want to make sure.

Also, I'm not quite clear on how to interpret the two numbers in the large tuples log output.

I ask because we have a 1.2TB database which is being replicated with slony 2.1 and while it usually cruises
along just fine, occasionally we will see ever increasing lag on busy days, that takes forever to catch up.

The servers in question are connected via gigabit and bandwidth utilization is quite low on the link. 

The provider has 256G of RAM and the subscriber has 80G of RAM.

sync_group_maxsize is the only non-default value in the slon config and is currently set at 500, though we
have tried various higher and lower values with seemingly little effect, probably because once you get
over 100 it doesn't make as much difference.

Anyone have further tuning suggestions?

If we look at our sync events in the logs, they look like this:

2014-04-02 11:00:18 PDT DEBUG1 remoteHelperThread_2_2: 0.987 seconds delay for first row
2014-04-02 11:16:59 PDT DEBUG1 remoteHelperThread_2_2: 1001.387 seconds until close cursor
2014-04-02 11:16:59 PDT DEBUG1 remoteHelperThread_2_2: inserts=44690 updates=133632 deletes=23148 truncates=0
2014-04-02 11:16:59 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  pqexec (s/count)- provider
2.048/406 - subscriber 0.000/406
2014-04-02 11:16:59 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  large tuples 1.315/1787
2014-04-02 11:17:07 PDT INFO   remoteWorkerThread_2: SYNC 5015991733 done in 1009.672 seconds
2014-04-02 11:17:07 PDT DEBUG1 remoteWorkerThread_2: SYNC 5015991733 sync_event timing:  pqexec
(s/count)- provider 0.002/2 - subscriber 0.004/2 - IUD 1008.625/40297

2014-04-02 11:17:08 PDT DEBUG1 remoteHelperThread_2_2: 1.094 seconds delay for first row
2014-04-02 11:56:01 PDT DEBUG1 remoteHelperThread_2_2: 2334.126 seconds until close cursor
2014-04-02 11:56:01 PDT DEBUG1 remoteHelperThread_2_2: inserts=83628 updates=122325 deletes=25012 truncates=0
2014-04-02 11:56:01 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  pqexec (s/count)- provider
2.213/465 - subscriber 0.000/465
2014-04-02 11:56:01 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  large tuples 0.756/965
2014-04-02 11:56:04 PDT INFO   remoteWorkerThread_2: SYNC 5015992233 done in 2337.211 seconds
2014-04-02 11:56:04 PDT DEBUG1 remoteWorkerThread_2: SYNC 5015992233 sync_event timing:  pqexec
(s/count)- provider 0.010/2 - subscriber 0.005/2 - IUD 2336.036/46197

2014-04-02 11:56:06 PDT DEBUG1 remoteHelperThread_2_2: 1.520 seconds delay for first row
2014-04-02 12:16:37 PDT DEBUG1 remoteHelperThread_2_2: 1232.814 seconds until close cursor
2014-04-02 12:16:37 PDT DEBUG1 remoteHelperThread_2_2: inserts=109353 updates=124460
deletes=41581 truncates=0
2014-04-02 12:16:37 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  pqexec (s/count)- provider
2.663/554 - subscriber 0.000/554
2014-04-02 12:16:37 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  large tuples 3.342/4818
2014-04-02 12:16:42 PDT INFO   remoteWorkerThread_2: SYNC 5015992733 done in 1237.954 seconds
2014-04-02 12:16:42 PDT DEBUG1 remoteWorkerThread_2: SYNC 5015992733 sync_event timing:  pqexec
(s/count)- provider 0.002/2 - subscriber 0.005/2 - IUD 1236.358/55084

2014-04-02 12:16:44 PDT DEBUG1 remoteHelperThread_2_2: 1.339 seconds delay for first row
2014-04-02 12:25:28 PDT DEBUG1 remoteHelperThread_2_2: 525.458 seconds until close cursor
2014-04-02 12:25:28 PDT DEBUG1 remoteHelperThread_2_2: inserts=82400 updates=132333 deletes=61684 truncates=0
2014-04-02 12:25:28 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  pqexec (s/count)- provider
2.351/556 - subscriber 0.000/556
2014-04-02 12:25:28 PDT DEBUG1 remoteWorkerThread_2: sync_helper timing:  large tuples 3.322/5009
2014-04-02 12:25:31 PDT INFO   remoteWorkerThread_2: SYNC 5015993233 done in 528.010 seconds
2014-04-02 12:25:31 PDT DEBUG1 remoteWorkerThread_2: SYNC 5015993233 sync_event timing:  pqexec
(s/count)- provider 0.001/2 - subscriber 0.004/2 - IUD 526.604/55287

Slony1-general mailing list
Christopher Browne | 2 Apr 22:28 2014

Fwd: help tuning to reduce replication lag

I replied; should also forward to the list...

---------- Forwarded message ----------
From: Christopher Browne <>
Date: Wed, Apr 2, 2014 at 4:28 PM
Subject: Re: [Slony1-general] help tuning to reduce replication lag
To: Jeff Frost <>

The 0.978 seconds is how long it took for the CURSOR to get to the point where it was able to provide the first row.

Given that the SYNCs seem to be taking on the order of 1000 seconds, that's not much overhead.

(In contrast, it would be distressing if it took 1219.35 seconds for the "delay for first row", and then the SYNC was completed in another 2 seconds.)

It's taking a thousand-ish seconds to process ~200K inserts/updates/deletes, which doesn't seem ludicrously out of line with what I'd expect.

It doesn't seem likely to me that the amount of memory that you have is terribly relevant to performance; the processing of a stream of 200K-ish I/U/Ds won't be RAM-hungry, it's mostly hungry in:
a) Chewing CPU for the parsing and planning of each statement;
b) Chewing disk I/O for the processing of the I/U/Ds and logging updates in WAL.

I would expect Slony version 2.2 to be a fair bit quicker, as it uses COPY protocol to copy the data in, which dramatically reduces the amount of effort that the subscriber server needs to do parsing and planning the SQL for the INSERT/UPDATE/DELETE statements.

Slony1-general mailing list
Tory M Blue | 19 Mar 23:27 2014

Secondary Site, Secondary slon cluster, best way to construct

I've gotten beyond much of my issues with replicating between locations. Most of which were all network, timeouts, keepalive settings etc.

but now I'm trying to figure out the best way to have 2 slon clusters, that will allow tear down of one without affecting the other.

So currently I have a 4 node cluster (Active Site A)

masterhost- node 1
slavehost - node 2
queryslavehost1 node 3
queryslavehost2 - node 4

master and slave have full table replication (3 sets), these are switchover/failover candidates.   masterhost also replicates to queryslavehost1 and queryslavehost2 but only  1 set.

I now want to bring up a second site, with 4 nodes (would love to  have these node 1-4 as well, but that's not going to work).

Site B:

dc2masterhost - node 5
dc2slavehost - node 6
dc2queryslavehost1 - node 7
dc2queryslavehost2 - node 8

I would like slavehost (node2), replicate to dc2masterhost (node 5) and be a switchover/failover choice  with both node1 and node2.

At the same time,  have node 6, 7, 8 replicate from node 5, where node 6 is again a failover/switchover replica partner for node 5 (full 3 set replication), and have node 6 replicate to nodes 7 and 8 (1 set replication).

I really need a graphic to show this I think.  But I need to create multiple scripts I think to make this work as at some point nodes 1-4 may go away entirely and then I'm left with 5-8 which is not an issue in itself it's the time where both environments are up that I need to make sure i have this configured right.

I'm just not sure what happens if (and I've got this working), replicate all 3 sets from node2 to node 5. But now I need to tell node6 to replicate from node5 and node 7/8 to replicate from node6.  I'm not sure what happens with node 6-8 if I do a switchover between  node 1/2 to node 5 and or reverse.

Guess I'm not entirely sure how everything is aware of where they need to replicate from if things change in the cluster.

Thanks and I'm more then willing to try to make this more clear!:)

Slony1-general mailing list
Vick Khera | 14 Mar 15:07 2014

slon process "stops" after 58 minutes

So in prep for upgrading from 2.1 to 2.2 this weekend, I upgraded my server's OS from FreeBSD 9.1 to 9.2 (a fairly minor update, as OS updates go).

Since the upgrade, the slon connected to the replica DB on that upgraded server will stop after just about 58 to 59 minutes. Restarting the slon daemon allows the replication to continue and fairly quickly catch up.

By "stop" I mean there is nothing visibly going on -- replication stalls and nothing is logged.

Here is the tail end of my log file from a few minutes ago. The slon process was started at 8:48am.

2014-03-14 09:46:43.744319500 DEBUG1 calc sync size - last time: 1 last length: 2002 ideal: 29 proposed size: 3
2014-03-14 09:46:43.745342500 DEBUG1 about to monitor_subscriber_query - pulling big actionid list for 4
2014-03-14 09:46:43.749657500 INFO   remoteWorkerThread_4: syncing set 1 with 262 table(s) from provider 4
2014-03-14 09:46:43.762199500 DEBUG1 remoteHelperThread_4_4: 0.012 seconds delay for first row
2014-03-14 09:46:43.766863500 DEBUG1 remoteHelperThread_4_4: 0.016 seconds until close cursor
2014-03-14 09:46:43.766867500 DEBUG1 remoteHelperThread_4_4: inserts=266 updates=350 deletes=176 truncates=0
2014-03-14 09:46:43.766869500 DEBUG1 remoteWorkerThread_4: sync_helper timing:  pqexec (s/count)- provider 0.014/5 - subscriber 0.000/5
2014-03-14 09:46:43.766872500 DEBUG1 remoteWorkerThread_4: sync_helper timing:  large tuples 0.000/0
2014-03-14 09:46:44.006795500 INFO   remoteWorkerThread_4: SYNC 5015580475 done in 0.262 seconds
2014-03-14 09:46:44.006853500 DEBUG1 remoteWorkerThread_4: SYNC 5015580475 sync_event timing:  pqexec (s/count)- provider 0.001/2 - subscriber 0.005/2 - IUD 0.242/164

at this point nothing more gets logged.

Looking at the activity in the DB, I see the 5 connections from this slon, with all but one having a query start time of 09:46:44. This is the query that was running for over 10 minutes:

datid            | 16392
datname          | vkmlm
pid              | 7159
usesysid         | 16389
usename          | slony
application_name | slon.local_cleanup
client_addr      |
client_hostname  |
client_port      | 55142
backend_start    | 2014-03-14 08:48:16.198806-04
xact_start       |
query_start      | 2014-03-14 09:34:32.735557-04
state_change     | 2014-03-14 09:34:32.745553-04
waiting          | f
state            | idle
query            | begin;lock table "_mailermailer".sl_config_lock;select "_mailermailer".cleanupEvent('10 minutes'::interval);commit;

pg_cancel_backend() will not kill that query. I did a pg_terminate_backend() and it got rid of that process, but the rest are still seemingly stuck and nothing is logging from slon.

Any ideas? This is so confusing because it is such an odd time interval before it locks up. What's magical about 58 minutes?

OS: FreeBSD 9.2/amd64
Slony1: 2.1.3
Postgres: 9.2.7
Slony1-general mailing list
Vick Khera | 13 Mar 19:43 2014

question about upgrading 2.1 to 2.2 steps from manual

In the docs, a suggested method to upgrade 2.1 to to 2.2 is as follows:

  1. Use SLONIK LOCK SET to lock all sets, so that no new changes are being injected into the log tables
  2. Set slon parameter slon_config_cleanup_interval to a very low value (a few seconds) so that the slon cleanup thread will trim data out of sl_log_1 and sl_log_2 immediately
  3. Restart slon for each node and let it run through a cleanup cycle to empty out sl_log_1 and sl_log_2 on all nodes.
  4. Verify that sl_log_1 and sl_log_2 are empty on all nodes in the cluster.
  5. Use SLONIK UPDATE FUNCTIONS against each node to upgrade to version 2.2
  6. Use SLONIK UNLOCK SET to unlock all sets
Based on my experience and other reading, I think between steps 4 and 5 should be:

  1. stop slon
  2. install new version of slony
then after step 5, restart slon.

Is that the correct place to perform the install? Or can/should that be done between steps 2 and 3?

Also, will the lock set survive restarting slon? I thought it took out an exclusive lock on all tables, but if you stop the process, the locks would then go away.
Slony1-general mailing list
CS DBA | 26 Feb 19:39 2014


HI All;

We have a client running postGIS, we're preparing to deploy SLONY. 
However the postGIS meta tables geometry_columns and spatial_ref_sys do 
not seem to get deployed with primary keys (via the postGIS install & 
enabling a db to be spatial).

Anyone running replication across postGIS enabled databases?  Do we need 
to add primary keys and replicate the above gis meta tables? Or are 
these simply lookup / meta data that will be populated appropriately as 
stand alone tables on both the master and the slave?

Thanks in advance...