Soni M | 17 Apr 05:44 2015
Picon

long 'idle in transaction' from remote slon

Hello All,
2 nodes configured for slony 2.0.7, on RHEL 6.5 using postgres 9.1.14. Each slon manage local postgres.
Slony and RHEL installed from postgres yum repo.

On some occasion, on master db, the cleanupEvent last for long time, up to 5 minutes, normally it finish for a few seconds. The 'truncate sl_log_x' is waiting for a lock which takes most time. This make all write operation to postgres have to wait also, some get failed. As I inspected, what makes truncate wait is another slon transaction made by slon slave process, that is transaction which run 'fetch 500 from LOG'. This transaction left 'idle on transaction' for a long time on some occasion.

Why is this happen ? Is this due to network latency between nodes ? Is there any work around for this?

Many thanks, cheers...

--
Regards,

Soni Maula Harriz
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
David Fetter | 15 Apr 00:56 2015

Multiple slons per node pair?

Folks,

This came up in the context of making slony k-safe for some k>0.

Naively, a simple way to do this would be to have >1 machine, each
running all the slons for a cluster, replacing any machines that fail.

Would Bad Things™ happen as a consequence?

Cheers,
David.
--

-- 
David Fetter <david <at> fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter <at> gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
_______________________________________________
Slony1-general mailing list
Slony1-general <at> lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general
Dave Cramer | 18 Mar 14:50 2015
Picon

replicating execute script

Due to the usage of capital letters in the slony cluster execute script fails.

I am looking to replicate execute script for DDL changes. From what I can see execute script takes a lock out on sl_lock before executing the script, and releases it at the end.

What else am I missing ?

Dave Cramer
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Clement Thomas | 23 Feb 16:51 2015
Picon

sl_log_1 and sl_log_2 tables not cleaned up

Hi All,
          we face a weird problem in our 3 node slony setup.

* node1 (db1.domain.tld )  is the master provider and node2
(db2.domain.tld ), node3  (db3.domain.tld ) are subscribers.
currently nodes have 5 replication sets and the replication is working
fine.
* the problem is sl_log_1 and sl_log_2 tables in node1 gets cleaned up
properly, but the tables in the node2 and node3 doesn't.  On node1 the
total number of rows in sl_log_1 table is 24845 and in sl_log_2 it is
0. whereas

node2:

                         relation                         |  size
----------------------------------------------------------+---------
 _mhb_replication.sl_log_2                                | 130 GB
 _mhb_replication.sl_log_2_idx1                           | 47 GB
 _mhb_replication.PartInd_mhb_replication_sl_log_2-node-1 | 30 GB

node3:
                         relation                         |  size
----------------------------------------------------------+--------
 _mhb_replication.sl_log_2                                | 133 GB
 _mhb_replication.sl_log_2_idx1                           | 47 GB
 _mhb_replication.PartInd_mhb_replication_sl_log_2-node-1 | 30 GB
 _mhb_replication.sl_log_1                                | 352 MB

in node2 and node3 could see the following lines frequently.

slon[20695]: [4031-1] FATAL  cleanupThread: "delete from
"_mhb_replication".sl_log_1 where log_origin = '1' and log_xid <
'2130551154'; delete from
slon[20695]: [4031-2]  "_mhb_replication".sl_log_2 where log_origin =
'1' and log_xid < '2130551154'; delete from
"_mhb_replication".sl_seqlog where
slon[20695]: [4031-3]  seql_origin = '1' and seql_ev_seqno <
'51449379'; select "_mhb_replication".logswitch_finish(); " - ERROR:
canceling statement
slon[20695]: [4031-4]  due to statement timeout
slon[20695]: [4032-1] DEBUG2 slon_retry() from pid=20695

please find the slony_tools.conf here
https://gist.github.com/clement1289/d928acb771ca01a89281 and sl_status
/sl_listen output here
https://gist.github.com/clement1289/88df40f77c03c691eee5 . Hoping for
some help.

Regards,
Clement
Mark Steben | 19 Feb 17:59 2015

Wish to run altperl scripts on master rather than slave

Good morning,

We are running the following on both master and slave: (a simple 1 master to 1 slave configuration)
    postgresql 9.2.5
    slony1-2.2.2
     x86_64 GNU/Linux

We currently run altperl scripts to kill / start slon daemons from the slave:
   cd ...bin folder
   ./slon_kill -c .../slon_tools.....conf
         and
   ./slon_start -c ../slon_tools...conf 1 (and 2)

 Because we need to run maintenance on the replicated db on the master without slony running I would like to run these commands on the master before and after the maintenance.  Since the daemons now run on the slave when I attempt to run these commands on the master the daemons aren't found.  Is
there a prescribed way to accomplish this?  I could continue to run them on the
slave and send a flag to the master when complete but I'd like to take a simpler approach if possible. 
 Any insight appreciated.  Thank you.



--
Mark Steben
 Database Administrator
<at> utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Mark Steben | 19 Feb 17:38 2015

Fwd: The results of your email commands

Greetings, I put in a question to slony_general_requests and got this back almost immediately.
Does this mean I'm in the queue or have I been bounced out?
Thx, Mark

---------- Forwarded message ----------
From: <slony1-general-bounces-8kkgcvHRObyz5F2/bZa4Fw@public.gmane.org>
Date: Thu, Feb 19, 2015 at 9:12 AM
Subject: The results of your email commands
To: mark.steben-UjXgi8GuFLL8esGaZs7s5AC/G2K4zDHf@public.gmane.org


The results of your email command are provided below. Attached is your
original message.

- Results:
    Ignoring non-text/plain MIME parts

- Unprocessed:
    We are running the following on both master and slave: (a simple 1 master
    to 1 slave configuration)
        postgresql 9.2.5
        slony1-2.2.2
         x86_64 GNU/Linux
    We currently run altperl scripts to kill / start slon daemons from the
    slave:
       cd ...bin folder
       ./slon_kill -c .../slon_tools.....conf
             and
       ./slon_start -c ../slon_tools...conf 1 (and 2)
     Because we need to run maintenance on the replicated db on the master
    without slony running I would like to run these commands on the master
    before and after the maintenance.  Since the daemons now run on the slave
    when I attempt to run these commands on the master the daemons aren't
    found.  Is
    there a prescribed way to accomplish this?  I could continue to run them on
    the
    slave and send a flag to the master when complete but I'd like to take a
    simpler approach if possible.
     Any insight appreciated.  Thank you.

- Ignored:


    --
    *Mark Steben*
     Database Administrator
    <at> utoRevenue <http://www.autorevenue.com/> | Autobase
    <http://www.autobase.net/>
      CRM division of Dominion Dealer Solutions
    95D Ashley Ave.
    West Springfield, MA 01089
    t: 413.327-3045
    f: 413.383-9567

    www.fb.com/DominionDealerSolutions
    www.twitter.com/DominionDealer
     www.drivedominion.com <http://www.autorevenue.com/>

    <http://autobasedigital.net/marketing/DD12_sig.jpg>

- Done.



---------- Forwarded message ----------
From: Mark Steben <mark.steben <at> drivedominion.com>
To: slony1-general-request-8kkgcvHRObyz5F2/bZa4Fw@public.gmane.org
Cc: 
Date: Thu, 19 Feb 2015 09:15:06 -0500
Subject: slon_kill, slon_start to run on master
Good morning,

We are running the following on both master and slave: (a simple 1 master to 1 slave configuration)
    postgresql 9.2.5
    slony1-2.2.2
     x86_64 GNU/Linux

We currently run altperl scripts to kill / start slon daemons from the slave:
   cd ...bin folder
   ./slon_kill -c .../slon_tools.....conf
         and
   ./slon_start -c ../slon_tools...conf 1 (and 2)

 Because we need to run maintenance on the replicated db on the master without slony running I would like to run these commands on the master before and after the maintenance.  Since the daemons now run on the slave when I attempt to run these commands on the master the daemons aren't found.  Is
there a prescribed way to accomplish this?  I could continue to run them on the
slave and send a flag to the master when complete but I'd like to take a simpler approach if possible. 
 Any insight appreciated.  Thank you.


--
Mark Steben
 Database Administrator
<at> utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com









--
Mark Steben
 Database Administrator
<at> utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Tory M Blue | 5 Feb 19:19 2015
Picon

sl_log_1 not truncated, could not lock

2015-02-05 09:53:29 PST clsdb postgres 10.13.200.232(54830) 51877 2015-02-05 09:53:29.976 PSTNOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1

2015-02-05 10:01:34 PST clsdb postgres 10.13.200.231(46083) 42459 2015-02-05 10:01:34.481 PSTNOTICE:  Slony-I: could not lock sl_log_1 - sl_log_1 not truncated

Sooo I have 13 million rows in sl_log_1 and from my checks of various tables things are replicated, but this table still has a lock and is not being truncated, These errors have been happening since 12:08 AM..

my sl_log_2 table now has 8 million rows but I'm replicating and not adding a bunch of data. We did some massive deletes last nights, 20million was the last batch when things stopped seeming to switch and truncate.

Soooo, questions. How can I verify sl_log_1 can be truncated (everything in it has been replicated) and how can I figure out what is locking, so that slony can't truncate?

I'm not hurting, just stressing at this point

Thanks
tory
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Tory M Blue | 3 Feb 23:27 2015
Picon

Slony monitoring, management


Good afternoon,

I ran into an issue today where a node seemed to be lost, at 5am, a single query node stopped replicating (although logs and slon process showed things were healthy), but my primary DB kept storing entries until we hit over 7 million rows, because my query db was not "doing something" that something is a mystery. I finally dropped and re-added the node.   This has been running stable since we went to 9.3.4 and 2.2.3 slony. Nothing happened to the hardware, the query node was healthy, but seemingly not replicating (replication check , showed that the queried table was not  changing, when the other nodes saw the change).

So anyways, I didn't really know how to check to see what data was in sl_logs (other than querying them), nor did I know of any way to verify that data was moving out and being replicated (other than my repl check). I wanted to know if there was something out there that peeled the covers back on slon and sl_log to verify that things are replicating in a timely period, to know where the data in the sl_log is destined to and if a single host is holding up the show.

Just need more ways to check slons health, progress, backlog etc. Is there a front end somewhere that will let you see into the inner workings and states of Slony?

Thanks
Tory
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Glyn Astill | 29 Jan 18:01 2015
Picon

Slony 2.2.3 extended lag recovery

Hi All,

We're currently running slony 2.2.3 with 4 pg 9.0 nodes.  Occasionally since we upgraded from 2.1 I've been
seeing some "humps" where subscribers are lagging and taking an extended period of time to recover.

I can't ever reproduce it and I've come to a dead end.  I'm going to waffle a bit below, but I'm hoping someone
can see something I'm missing.

These humps appear to not really correlate with increased activity on the origin, and I've been struggling
to put my finger on anything aggravating the issue.  Today however I've seen the same symptoms, and the
start times of the lag align with an exclusive lock on a subscribers replicated table whilst vaccum full
was run.

Whilst I'd expect that to cause some lag and a bit of a backlog, the vacuum full took only 2 minutes and the lag
builds up gradually afterwards.  Eventually after a long time replication will catch up, but it's out of
proportion to our transaction rate, and a restart of the slon on the subscriber causes it to catch up very
swiftly.  I've attached a graph of sl_status from the origin showing the time and event lag buildup and a
pretty swift slice on the end of the humps where I restart the slons.

The graph (attached) shows nodes 4 & 5 starting to lag first, as they were the first to have the vacuum full
run, then node 7 starts to lag when it has the same vacuum full run (at this point the lag on the two other nodes
hadn't been noticed).  This excerpt from one of the subscribers shows the copy being blocked:

2015-01-29 10:09:54 GMT [13246]: [39-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
2015-01-29 10:09:54 GMT [13246]: [40-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  SQL statement "SELECT "_main_replication".logswitch_start()"
2015-01-29 10:12:04 GMT [13243]: [9-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  duration: 5089.684 ms 
statement: COPY "_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:19:05 GMT [13243]: [10-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  process 13243 still waiting
for RowExclusiveLock on relation 279233 of database 274556 after 1000.038 ms at character 13
2015-01-29 10:19:05 GMT [13243]: [11-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost QUERY:  UPDATE ONLY
"myschema"."table_being_full_vacuumed" SET "text" = $1 WHERE "address" = $2;
2015-01-29 10:19:05 GMT [13243]: [12-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost CONTEXT:  COPY sl_log_1, line 37: "8   
1108084090    2    1219750937    myschema    table_being_full_vacuumed    U    1    {text,"",address,some_address_data}"
2015-01-29 10:19:05 GMT [13243]: [13-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost STATEMENT:  COPY
"_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:20:43 GMT [13243]: [14-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  process 13243 acquired
RowExclusiveLock on relation 279233 of database 274556 after 98754.902 ms at character 13
2015-01-29 10:20:43 GMT [13243]: [15-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost QUERY:  UPDATE ONLY
"myschema"."table_being_full_vacuumed" SET "text" = $1 WHERE "address" = $2;
2015-01-29 10:20:43 GMT [13243]: [16-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost CONTEXT:  COPY sl_log_1, line 37: "8   
1108084090    2    1219750937    myschema    table_being_full_vacuumed    U    1    {text,"",address,some_address_data}"
2015-01-29 10:20:43 GMT [13243]: [17-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost STATEMENT:  COPY
"_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:20:43 GMT [13243]: [18-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  duration: 98915.154 ms 
statement: COPY "_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:22:00 GMT [13246]: [41-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: log switch to sl_log_1 complete - truncate sl_log_2
2015-01-29 10:22:00 GMT [13246]: [42-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  PL/pgSQL function "cleanupevent" line 94 at assignment
2015-01-29 10:34:01 GMT [13246]: [43-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
2015-01-29 10:34:01 GMT [13246]: [44-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  SQL statement "SELECT "_main_replication".logswitch_start()"
2015-01-29 10:46:08 GMT [13246]: [45-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: could not lock sl_log_1 - sl_log_1 not truncated
2015-01-29 10:46:08 GMT [13246]: [46-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  PL/pgSQL function "cleanupevent" line 94 at assignment

After this the copies go through cycles of increasing and decreacing duration, which I'm guessing is
something normal (perhaps syncs being grouped?), and I'm seeing messages stating "could not lock
sl_log_1 - sl_log_1 not truncated" a couple of times before the switch completes, and again I'm guessing
this is just blocking because of inserts capturing changes and is normal? Autovacuum hasn't hit sl_log at
all during this period.

Does anyone have any ideas?  I've debug logs from the slons, and postgres logs I can send off list if anyone has
any ideas.

Thanks
Glyn
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Stéphane Schildknecht | 27 Jan 19:31 2015
Picon

slony1-ctl 1.3.0 released

Hello,

The slony1-ctl development team is proud to announce version 1.3.0 of
slony1-ctl, a collection of shell scripts aiming at simplifying everyday
admnistration of a Slony replication.

This version adds no new feature but compatibility with slony 2.2.
The major change is a better use of variables and a deep comments cleaning.

The project homepage :
  http://pgfoundry.org/projects/slony1-ctl/

The package may be downloaded at :
  http://pgfoundry.org/frs/download.php/3838/slony1-ctl-REL1_3_0.tar.gz

Best regards,
--

-- 
Stéphane Schildknecht
Contact régional PostgreSQL pour l'Europe francophone
Loxodata - Conseil, expertise et formations

_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Dave Cramer | 21 Jan 15:24 2015
Picon

Lots of data in sl_log_? but sl_status shows lag of 0

How is this possible ?

Dave Cramer
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general

Gmane