上原 一樹 | 28 Nov 01:36 2014
Picon

Why helper-threads log is remained?

Hi,

Please tell me.
I have understood that helper threads was removed at 2.1.4 → 2.2.0.
Why this log is remained here?

slony1-2.2.0/src/slon/remote_worker.c

1695         if (provider->set_head == NULL)
1696         {
1697             /*
1698              * Tell this helper thread to exit, join him and destroy thread
1699              * related data.
1700              */
1701             slon_log(SLON_CONFIG, "remoteWorkerThread_%d: "
1702                      "helper thread for provider %d terminated\n",
1703                      node->no_id, provider->no_id);
...

Does this slon_log only inform user about whether provider is empty?

regards,
uehara

--

-- 
上原 一樹
NTT OSSセンタ DBMS担当
Mail : uehara.kazuki@...
Phone: 03-5860-5115
(Continue reading)

Granthana Biswas | 30 Oct 09:59 2014
Picon

Repeated Slony-I: cleanup stale sl_nodelock entry for pid on subscribing new set from node 3

Hi All,

My replication setup is as  db1 -> db2 -> db3.

On adding a new set to the cluster, the merge from node 3 is going on waiting state for node 3 to subscribe. Because of this, node 3 is lagging behind.

These are the slonik commands that I used to add new set, subscribe and merge:
---------------------------------------------------------------------------------------------------------
create set ( id = 2, origin = 1, comment = 'replication set for surcharge table');

set add table (set id = 2, origin = 1, id = 1744, fully qualified name = 'public.t2', comment = 't2 table');
set add sequence (set id = 2, origin = 1, id = 1756, fully qualified name = 'public.s2', comment = 's2 sequence');

subscribe set ( id = 2, provider = 1, receiver = 2, forward = yes );
subscribe set ( id = 2, provider = 2, receiver = 3, forward = yes );

merge set ( id = 1, add id = 2, origin = 1);
-----------------------------------------------------------------------------------------------------------

Even though it goes it waiting mode, the sl_subscribe table shows the following:

 sub_set | sub_provider | sub_receiver | sub_forward | sub_active 
---------+--------------+--------------+-------------+------------
       1 |            1 |            2 | t           | t
       1 |            2 |            3 | t           | t
       2 |            1 |            2 | t           | t
       2 |            2 |            3 | t           | t


But the slony log on node 3 shows the following repeatedly:

NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=29117
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=30115
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=30116
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=30414
NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
CONTEXT:  SQL statement "SELECT  "_cluster".logswitch_start()"
PL/pgSQL function "cleanupevent" line 96 at PERFORM
NOTICE:  truncate of <NULL> failed - doing delete
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=31364
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=31369
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=31368
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=32300
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=1117
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=1149
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=1186
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=1247
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=1270
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=1294


It is continuously trying to cleanup stale nodelock entry for nl_nodeid=3 and nl_conncnt=0.

I tried stopping and starting the slon process for node 3 which didn't help. I don't see any errors in the other slony log files.

Do I have to stop all slon processes of all nodes and start again?

Regards,
Granthana
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Tory M Blue | 29 Oct 13:50 2014
Picon

finishTableAfterCopy(143); long running "index?"

So I've been watching paint dry since before midnight, it's now 5:44am PST. I have no idea whether slon is actually doing anything at this point. My slave node (did an add), is quite large the table it's working on was 52GB when it completed the transfer and is now 77GB, but has been that size for over 2 hours (on disk du)

Just not sure if it's stuck or it's actually doing anything. I see this in my stats table, and i have a single cpu pegged at 99-100% that has been running for well;

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                         

 54600 postgres  20   0 4617m 4.4g 2.0g R 99.7  1.7 399:57.68 postmaster                                                                                                                                                                       



16398 | clsdb | 54600 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       54260 | 
2014-10-28 22:19:29.277022-07 | 2014-10-29 00:05:40.884649-07 | 2014-10-29 01:17:22.102619-07 | 2014-10-29 01:17:22.10262-07  | f   
    | active | select "_cls".finishTableAfterCopy(143); analyze "torque"."iimpressions"; 

I now have 5million rows backed up on the master.I'm watching to see if anything negative happens, where I have to drop this node, just so the master can truncate all that data and then start again. I'd rather not do that if I come to believe this is still working and it may finish in the next couple of hours?

I've set maintenance_work_mem to 10GB to try to give this some room, but no changes.  This is a single process, but it's an index creation, one would think it could use the 256GB of ram to it's advantage.

Anyways is there somewhere I can look in slon /postgres  to see if it's doing anything more then sticking a cpu to 100% and making me crave sleep?

Thanks
Tory
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Dave Cramer | 23 Oct 09:51 2014
Picon

Dropping a cluster blocked by a primary key

We have an old slave that will not let us drop a cluster.  The error is ERROR: "xxxxx_pkey" is an index.

Is there an easy way around this

_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Dave Cramer | 16 Oct 12:19 2014
Picon

Lag time increasing but there are no events

I have a situation I can't explain.

sl_status shows lag time increasing. num events is 0, and the data is being replicated.

What exactly does lag_time represent ?


Dave Cramer
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Granthana Biswas | 14 Oct 12:15 2014
Picon

Changing master node's IP & port

Hi All,

I am trying to change the master node's IP and port

After stopping the slon process, I ran store path as follows:

store path (server = 1, client = 2, conninfo='host=new_ip dbname=$DB1 user=$USER1 port=$PORT1');
store path (server = 1, client = 3, conninfo='host=new_ip dbname=$DB1 user=$USER1 port=$PORT1');


But on trying to start slon process again after changing the node config, I got the following error:

nohup /usr/bin/slon -d 2 -p /var/run/slony1/cl1node1.pid -f /home/postgres/slony_test/cl1_node1.conf > /home/postgres/slony_test/log/cl1node1.log 2>&1 &
[1] 5626
postgres <at> GB:~/slony_test$ 2014-10-14 13:09:13 IST ERROR:  duplicate key value violates unique constraint "sl_nodelock-pkey"
2014-10-14 13:09:13 IST DETAIL:  Key (nl_nodeid, nl_conncnt)=(1, 0) already exists.
2014-10-14 13:09:13 IST STATEMENT:  select "_Cluster1".cleanupNodelock(); insert into "_Cluster1".sl_nodelock values (    1, 0, "pg_catalog".pg_backend_pid());


I even tried the drop path but it gives the following error:

sh drop_path.sh
2014-10-14 13:52:06 IST ERROR:  Slony-I: Path cannot be dropped, subscription of set 1 needs it
2014-10-14 13:52:06 IST STATEMENT:  lock table "_Cluster1".sl_event_lock, "_Cluster1".sl_config_lock;select "_Cluster1".dropPath(1, 2);
<stdin>:8: PGRES_FATAL_ERROR lock table "_Cluster1".sl_event_lock, "_Cluster1".sl_config_lock;select "_Cluster1".dropPath(1, 2);  - ERROR:  Slony-I: Path cannot be dropped, subscription of set 1 needs it


Did anyone face the same issue when trying to change ip of a node? In my case, I am trying to change IP of master node.

Regards,
Granthana
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Glyn Astill | 9 Oct 23:26 2014
Picon

Re: Error with Slony replication from another slony cluster slave

> From: Granthana Biswas <granthana.biswas@...>
>To: Glyn Astill <glynastill@...> 
>Sent: Thursday, 9 October 2014, 16:34
>Subject: Re: [Slony1-general] Error with Slony replication from another slony cluster slave
> 
>
>
>Hi Glyn,
>
>
>In my case I have two clusters:
>
>
>Cluster1 ->  replicating from DB1 -> DB2
>Cluster2 ->  replicating from DB1 -> DB3
>
>
>Can I stop Cluster2 and add DB3 to Cluster1 with DB2 as its master? Or do I have to delete the data first in DB3?
>
>

You'll want to run DROP NODE against each node in Cluster2, (or if on 2.0+ you can get away with just DROP
SCHEMA _Cluster2 CASCADE) and stop the slons for Cluster2.

Then just run through adding the node into Cluster1 i.e. STORE NODE, STORE PATH for DB3 on Cluster1 and then
SUBSCRIBE SET with the provider as DB2.

As long as your schemas are as you want, the tables will be truncated on DB3 when you subscribe the sets.

>I want to stop DB1 eventually and make DB2 the master.
>
>
>Regards,
>Granthana
>
>
>
>
>
>
>
>
>
>On Thu, Oct 9, 2014 at 7:32 PM, Glyn Astill <glynastill@...> wrote:
>
>________________________________
>>> From: Granthana Biswas <granthana.biswas@...>
>>>To: "slony1-general@..." <slony1-general@...>
>>>Sent: Thursday, 9 October 2014, 14:19
>>>Subject: [Slony1-general] Error with Slony replication from another slony cluster slave
>>>
>>>
>>>
>>>Hi Glyn,
>>>
>>>
>>>
>>>I am trying to set up a totally separate cluster for DB2 -> DB3  :
>>>
>>>Cluster1 -> replicating from DB1 to -> DB2
>>>
>>>
>>>Cluster2 -> replicating from DB2 to -> DB3
>>>
>>>
>>>Granthana
>>>
>>>
>>
>>As Jan has already stated, it's not possible to do this with 2 separate clusters because the logtriggers
won't fire on DB2 due to the session replication role.
>>
>>You want to get rid of Cluster2 and subscribe DB3 into Cluster1 with DB2 as it's provider.
>>
>>
>>
>>>On Thu, Oct 9, 2014 at 6:10 PM, Glyn Astill <glynastill@...> wrote:
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>>________________________________
>>>>> From: Granthana Biswas <granthana.biswas@...>
>>>>>To: slony1-general@...
>>>>>Sent: Thursday, 9 October 2014, 13:18
>>>>>Subject: [Slony1-general] Error with Slony replication from another slony    cluster slave
>>>>
>>>>>
>>>>>
>>>>>
>>>>>Hi All,
>>>>>
>>>>>
>>>>>I am trying to replicate from another Slony cluster's slave node.
>>>>>
>>>>>
>>>>>Cluster1 -> replicating from DB1 to -> DB2
>>>>>
>>>>>
>>>>>Cluster2 -> replicating from DB2 to -> DB3
>>>>>
>>>>>
>>>>>The initial sync up went fine without any errors. There are no errors in logs of both the clusters. Also
no st_lag_num_events in DB3 or DB2 for Cluster2.
>>>>>
>>>>>
>>>>>But the data added in DB2 since I started slony Cluster2 is not reflecting in DB3.
>>>>>
>>>>>
>>>>>Does slony allow replication from another cluster's slony slave? Or did I miss something?
>>>>>
>>>>>
>>>>
>>>>Perhaps you could show us what you've done so far, to me it's unclear if you are setting up a totally
seperate slony cluster for DB2->DB3, or just trying to subscribe DB3 to sets in the existing cluster.
>>>>
>>>>
>>>>If it's the latter, then as long as the slave you want to use as the provider was subscribed to the sets you
want with "FORWARD = YES" it should be ok
>>>>
>>>>
>>>>See: http://main.slony.info/documentation/stmtsubscribeset.html
>>>>
>>>>You should be able to check by looking at sub_forward in sl_subscribe, some thing like:
>>>>
>>>>select * from _<cluster name>.sl_subscribe where sub_receiver = <id of DB2>;
>>>>
>>>>
>>>>>Thanks & Regards,
>>>>>Granthana Biswas
>>
>
>
>
Granthana Biswas | 9 Oct 14:18 2014
Picon

Error with Slony replication from another slony cluster slave

Hi All,

I am trying to replicate from another Slony cluster's slave node. 

Cluster1 -> replicating from DB1 to -> DB2

Cluster2 -> replicating from DB2 to -> DB3

The initial sync up went fine without any errors. There are no errors in logs of both the clusters. Also no st_lag_num_events in DB3 or DB2 for Cluster2.

But the data added in DB2 since I started slony Cluster2 is not reflecting in DB3.

Does slony allow replication from another cluster's slony slave? Or did I miss something?

Thanks & Regards,
Granthana Biswas


_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Vick Khera | 7 Oct 20:14 2014

Re: replicating from 9.3 to 8.4

can you then just set the bytea_output setting on the 9.3 server to
'escape' for this duration?

or perhaps set that on the slony user only?

On Tue, Oct 7, 2014 at 6:51 AM, Dave Cramer <davecramer@...> wrote:
> I'm running into http://www.slony.info/bugzilla/show_bug.cgi?id=331
>
> The reason for copying from 9.3 to 8.4 is that an 8.4 is in the upgrade
> path.
>
> It will be temporary, but necessary.
>
> Dave Cramer
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general@...
> http://lists.slony.info/mailman/listinfo/slony1-general
>
Steve Singer | 3 Oct 19:40 2014

Re: Slony 2.1.4 - Issues re-subscribing provider when origin down

On 10/03/2014 08:27 AM, Glyn Astill wrote:
> Hi All,
>
> I'm looking at a slony setup using 2.1.4, with 4 nodes in the following
> configuration:
>
>      Node 1 --> Node 2
>      Node 1 --> Node 3 --> Node 4
>
> Node 1 is the origin of all sets, and node 3 is a provider of all to
> node 4.  What I'm looking to do is fail over to node 2 when both nodes 1
> and 3 have gone down.
>
> Is this possible?

Improvements with dealing with multiple nodes failing at once was one of 
the big changes with 2.2

You might want to try something like

NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432
NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433
NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434

   FAILOVER (
           ID = 1, BACKUP NODE = 2);
  SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);

DROP NODE (ID = 3, EVENT NODE = 2);
DROP NODE (ID = 1, EVENT NODE = 2);

But I haven't tried to setup a cluster in this configuration so I can't 
say for sure if it will work or not.  As a general comment I think 
trying to reshape the cluster before the FAILOVER command will be 
problematic.

When I started doing a lot of failover tests with 2.1 I discovered a lot 
of cases that wouldn't work, or wouldn't work reliably.  That lead to 
major changes in the 2.2 for failover.

>
> In both a live environment that I've not had chance to move to 2.2 and
> my test environment I'm seeing the same issues, for my test environment
> the slonik script is:
>
>      CLUSTER NAME = test_replication;
>
>      NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432
> user=slony';
>      NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433
> user=slony';
>      NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434
> user=slony';
>      NODE 4 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5435
> user=slony';
>
>      SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
>      WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
>      SUBSCRIBE SET (ID = 2, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
>      WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
>      SUBSCRIBE SET (ID = 3, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
>      WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
>
>      DROP NODE (ID = 3, EVENT NODE = 2);
>
>      FAILOVER (
>          ID = 1, BACKUP NODE = 2
>      );
>
>      DROP NODE (ID = 1, EVENT NODE = 2);
>
> slonik is failing at the first subscribe set line as follows:
>
>      $ slonik test.scr
>      test.scr:8: could not connect to server: Connection refused
>          Is the server running on host "localhost" (127.0.0.1) and accepting
>          TCP/IP connections on port 5432?
>      test.scr:8: could not connect to server: Connection refused
>          Is the server running on host "localhost" (127.0.0.1) and accepting
>          TCP/IP connections on port 5434?
>      test.scr:8: could not connect to server: Connection refused
>          Is the server running on host "localhost" (127.0.0.1) and accepting
>          TCP/IP connections on port 5432?
>      Segmentation fault
>
> I get the same behaviour until I bring node 1 back up, then the script
> almost succeeds, but for an error
> stating that a record in sl_event already exists:
>
>      $ slonik ~/test.scr
>      ~/test.scr:8: could not connect to server: Connection refused
>          Is the server running on host "localhost" (127.0.0.1) and accepting
>          TCP/IP connections on port 5434?
>      waiting for events  (1,5000000172) only at (1,5000000162) to be
> confirmed on node 4
>      executing failedNode() on 2
>      ~/test.scr:17: NOTICE:  failedNode: set 1 has no other direct
> receivers - move now
>      ~/test.scr:17: NOTICE:  failedNode: set 2 has no other direct
> receivers - move now
>      ~/test.scr:17: NOTICE:  failedNode: set 3 has no other direct
> receivers - move now
>      ~/test.scr:17: NOTICE:  failedNode: set 1 has other direct
> receivers - change providers only
>      ~/test.scr:17: NOTICE:  failedNode: set 2 has other direct
> receivers - change providers only
>      ~/test.scr:17: NOTICE:  failedNode: set 3 has other direct
> receivers - change providers only
>      NOTICE: executing "_test_replication".failedNode2 on node 2
>      ~/test.scr:17: waiting for event (1,5000000175).  node 4 only on
> event 5000000162
> NOTICE: executing "_test_replication".failedNode2 on node 2
>      ~/test.scr:17: PGRES_FATAL_ERROR lock table
> "_test_replication".sl_event_lock,
> "_test_replication".sl_config_lock;select
> "_test_replication".failedNode2(1,2,2,'5000000174','5000000176');  -
> ERROR:  duplicate key value violates unique constraint "sl_event-pkey"
>      DETAIL:  Key (ev_origin, ev_seqno)=(1, 5000000176) already exists.
>      CONTEXT:  SQL statement "insert into "_test_replication".sl_event
>                  (ev_origin, ev_seqno, ev_timestamp,
>                  ev_snapshot,
>                  ev_type, ev_data1, ev_data2, ev_data3)
>                  values
>                  (p_failed_node, p_ev_seqfake, CURRENT_TIMESTAMP,
>                  v_row.ev_snapshot,
>                  'FAILOVER_SET', p_failed_node::text, p_backup_node::text,
>                  p_set_id::text)"
>      PL/pgSQL function
> _test_replication.failednode2(integer,integer,integer,bigint,bigint)
> line 14 at SQL statement
>      NOTICE: executing "_test_replication".failedNode2 on node 2
>      ~/test.scr:17: waiting for event (1,5000000177).  node 4 only on
> event 5000000175
>      ~/test.scr:21: begin transaction; -
>
>   After this sl_set on node 4 still has node 1 as the origin for one of
> the sets
>   (Is this possibly becasuse I'm not waiting properly or waiting on the
> wrong node?):
>
>      TEST=# table _test_replication.sl_set;
>       set_id | set_origin | set_locked |    set_comment
>      --------+------------+------------+-------------------
>            2 |          1 |            | Replication set 2
>        1 |          2 |            | Replication set 1
>            3 |          2 |            | Replication set 3
>      (3 rows)
>
> I've attached the slon logs if that would provide any better insight.
>
> Any help would be greatly appreciated.
>
> Thanks
> Glyn
>
Glyn Astill | 3 Oct 14:35 2014
Picon

Slony 2.1.4 - Issues re-subscribing provider when origin down

Hi All,

I'm looking at a slony setup using 2.1.4, with 4 nodes in the following configuration:

    Node 1 --> Node 2
    Node 1 --> Node 3 --> Node 4

Node 1 is the origin of all sets, and node 3 is a provider of all to node 4.  What I'm looking to do is fail over to node 2 when both nodes 1 and 3 have gone down.

Is this possible?

In both a live environment that I've not had chance to move to 2.2 and my test environment I'm seeing the same issues, for my test environment the slonik script is:

    CLUSTER NAME = test_replication;

    NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432 user=slony';
    NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433 user=slony';
    NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434 user=slony';
    NODE 4 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5435 user=slony';

    SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
    WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
    SUBSCRIBE SET (ID = 2, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
    WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
    SUBSCRIBE SET (ID = 3, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
    WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);

    DROP NODE (ID = 3, EVENT NODE = 2);

    FAILOVER (
        ID = 1, BACKUP NODE = 2
    );

    DROP NODE (ID = 1, EVENT NODE = 2);

slonik is failing at the first subscribe set line as follows:

    $ slonik test.scr
    test.scr:8: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?
    test.scr:8: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5434?
    test.scr:8: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?
    Segmentation fault

I get the same behaviour until I bring node 1 back up, then the script almost succeeds, but for an error
stating that a record in sl_event already exists:

    $ slonik ~/test.scr
    ~/test.scr:8: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5434?
    waiting for events  (1,5000000172) only at (1,5000000162) to be confirmed on node 4
    executing failedNode() on 2
    ~/test.scr:17: NOTICE:  failedNode: set 1 has no other direct receivers - move now
    ~/test.scr:17: NOTICE:  failedNode: set 2 has no other direct receivers - move now
    ~/test.scr:17: NOTICE:  failedNode: set 3 has no other direct receivers - move now
    ~/test.scr:17: NOTICE:  failedNode: set 1 has other direct receivers - change providers only
    ~/test.scr:17: NOTICE:  failedNode: set 2 has other direct receivers - change providers only
    ~/test.scr:17: NOTICE:  failedNode: set 3 has other direct receivers - change providers only
    NOTICE: executing "_test_replication".failedNode2 on node 2
    ~/test.scr:17: waiting for event (1,5000000175).  node 4 only on event 5000000162
    NOTICE: executing "_test_replication".failedNode2 on node 2
    ~/test.scr:17: PGRES_FATAL_ERROR lock table "_test_replication".sl_event_lock, "_test_replication".sl_config_lock;select "_test_replication".failedNode2(1,2,2,'5000000174','5000000176');  - ERROR:  duplicate key value violates unique constraint "sl_event-pkey"
    DETAIL:  Key (ev_origin, ev_seqno)=(1, 5000000176) already exists.
    CONTEXT:  SQL statement "insert into "_test_replication".sl_event
                (ev_origin, ev_seqno, ev_timestamp,
                ev_snapshot,
                ev_type, ev_data1, ev_data2, ev_data3)
                values
                (p_failed_node, p_ev_seqfake, CURRENT_TIMESTAMP,
                v_row.ev_snapshot,
                'FAILOVER_SET', p_failed_node::text, p_backup_node::text,
                p_set_id::text)"
    PL/pgSQL function _test_replication.failednode2(integer,integer,integer,bigint,bigint) line 14 at SQL statement
    NOTICE: executing "_test_replication".failedNode2 on node 2
    ~/test.scr:17: waiting for event (1,5000000177).  node 4 only on event 5000000175
    ~/test.scr:21: begin transaction; -

 After this sl_set on node 4 still has node 1 as the origin for one of the sets
 (Is this possibly becasuse I'm not waiting properly or waiting on the wrong node?):

    TEST=# table _test_replication.sl_set;
     set_id | set_origin | set_locked |    set_comment
    --------+------------+------------+-------------------
          2 |          1 |            | Replication set 2
          1 |          2 |            | Replication set 1
          3 |          2 |            | Replication set 3
    (3 rows)

I had attached the slon logs, but my mail to the list bounced, if that would provide any better insight I can provide them.

Any help would be greatly appreciated.

Thanks
Glyn

_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general

Gmane