Clement Thomas | 23 Feb 16:51 2015
Picon

sl_log_1 and sl_log_2 tables not cleaned up

Hi All,
          we face a weird problem in our 3 node slony setup.

* node1 (db1.domain.tld )  is the master provider and node2
(db2.domain.tld ), node3  (db3.domain.tld ) are subscribers.
currently nodes have 5 replication sets and the replication is working
fine.
* the problem is sl_log_1 and sl_log_2 tables in node1 gets cleaned up
properly, but the tables in the node2 and node3 doesn't.  On node1 the
total number of rows in sl_log_1 table is 24845 and in sl_log_2 it is
0. whereas

node2:

                         relation                         |  size
----------------------------------------------------------+---------
 _mhb_replication.sl_log_2                                | 130 GB
 _mhb_replication.sl_log_2_idx1                           | 47 GB
 _mhb_replication.PartInd_mhb_replication_sl_log_2-node-1 | 30 GB

node3:
                         relation                         |  size
----------------------------------------------------------+--------
 _mhb_replication.sl_log_2                                | 133 GB
 _mhb_replication.sl_log_2_idx1                           | 47 GB
 _mhb_replication.PartInd_mhb_replication_sl_log_2-node-1 | 30 GB
 _mhb_replication.sl_log_1                                | 352 MB

in node2 and node3 could see the following lines frequently.

(Continue reading)

Mark Steben | 19 Feb 17:59 2015

Wish to run altperl scripts on master rather than slave

Good morning,

We are running the following on both master and slave: (a simple 1 master to 1 slave configuration)
    postgresql 9.2.5
    slony1-2.2.2
     x86_64 GNU/Linux

We currently run altperl scripts to kill / start slon daemons from the slave:
   cd ...bin folder
   ./slon_kill -c .../slon_tools.....conf
         and
   ./slon_start -c ../slon_tools...conf 1 (and 2)

 Because we need to run maintenance on the replicated db on the master without slony running I would like to run these commands on the master before and after the maintenance.  Since the daemons now run on the slave when I attempt to run these commands on the master the daemons aren't found.  Is
there a prescribed way to accomplish this?  I could continue to run them on the
slave and send a flag to the master when complete but I'd like to take a simpler approach if possible. 
 Any insight appreciated.  Thank you.



--
Mark Steben
 Database Administrator
<at> utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Mark Steben | 19 Feb 17:38 2015

Fwd: The results of your email commands

Greetings, I put in a question to slony_general_requests and got this back almost immediately.
Does this mean I'm in the queue or have I been bounced out?
Thx, Mark

---------- Forwarded message ----------
From: <slony1-general-bounces-8kkgcvHRObyz5F2/bZa4Fw@public.gmane.org>
Date: Thu, Feb 19, 2015 at 9:12 AM
Subject: The results of your email commands
To: mark.steben-UjXgi8GuFLL8esGaZs7s5AC/G2K4zDHf@public.gmane.org


The results of your email command are provided below. Attached is your
original message.

- Results:
    Ignoring non-text/plain MIME parts

- Unprocessed:
    We are running the following on both master and slave: (a simple 1 master
    to 1 slave configuration)
        postgresql 9.2.5
        slony1-2.2.2
         x86_64 GNU/Linux
    We currently run altperl scripts to kill / start slon daemons from the
    slave:
       cd ...bin folder
       ./slon_kill -c .../slon_tools.....conf
             and
       ./slon_start -c ../slon_tools...conf 1 (and 2)
     Because we need to run maintenance on the replicated db on the master
    without slony running I would like to run these commands on the master
    before and after the maintenance.  Since the daemons now run on the slave
    when I attempt to run these commands on the master the daemons aren't
    found.  Is
    there a prescribed way to accomplish this?  I could continue to run them on
    the
    slave and send a flag to the master when complete but I'd like to take a
    simpler approach if possible.
     Any insight appreciated.  Thank you.

- Ignored:


    --
    *Mark Steben*
     Database Administrator
    <at> utoRevenue <http://www.autorevenue.com/> | Autobase
    <http://www.autobase.net/>
      CRM division of Dominion Dealer Solutions
    95D Ashley Ave.
    West Springfield, MA 01089
    t: 413.327-3045
    f: 413.383-9567

    www.fb.com/DominionDealerSolutions
    www.twitter.com/DominionDealer
     www.drivedominion.com <http://www.autorevenue.com/>

    <http://autobasedigital.net/marketing/DD12_sig.jpg>

- Done.



---------- Forwarded message ----------
From: Mark Steben <mark.steben <at> drivedominion.com>
To: slony1-general-request-8kkgcvHRObyz5F2/bZa4Fw@public.gmane.org
Cc: 
Date: Thu, 19 Feb 2015 09:15:06 -0500
Subject: slon_kill, slon_start to run on master
Good morning,

We are running the following on both master and slave: (a simple 1 master to 1 slave configuration)
    postgresql 9.2.5
    slony1-2.2.2
     x86_64 GNU/Linux

We currently run altperl scripts to kill / start slon daemons from the slave:
   cd ...bin folder
   ./slon_kill -c .../slon_tools.....conf
         and
   ./slon_start -c ../slon_tools...conf 1 (and 2)

 Because we need to run maintenance on the replicated db on the master without slony running I would like to run these commands on the master before and after the maintenance.  Since the daemons now run on the slave when I attempt to run these commands on the master the daemons aren't found.  Is
there a prescribed way to accomplish this?  I could continue to run them on the
slave and send a flag to the master when complete but I'd like to take a simpler approach if possible. 
 Any insight appreciated.  Thank you.


--
Mark Steben
 Database Administrator
<at> utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com









--
Mark Steben
 Database Administrator
<at> utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Tory M Blue | 5 Feb 19:19 2015
Picon

sl_log_1 not truncated, could not lock

2015-02-05 09:53:29 PST clsdb postgres 10.13.200.232(54830) 51877 2015-02-05 09:53:29.976 PSTNOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1

2015-02-05 10:01:34 PST clsdb postgres 10.13.200.231(46083) 42459 2015-02-05 10:01:34.481 PSTNOTICE:  Slony-I: could not lock sl_log_1 - sl_log_1 not truncated

Sooo I have 13 million rows in sl_log_1 and from my checks of various tables things are replicated, but this table still has a lock and is not being truncated, These errors have been happening since 12:08 AM..

my sl_log_2 table now has 8 million rows but I'm replicating and not adding a bunch of data. We did some massive deletes last nights, 20million was the last batch when things stopped seeming to switch and truncate.

Soooo, questions. How can I verify sl_log_1 can be truncated (everything in it has been replicated) and how can I figure out what is locking, so that slony can't truncate?

I'm not hurting, just stressing at this point

Thanks
tory
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Tory M Blue | 3 Feb 23:27 2015
Picon

Slony monitoring, management


Good afternoon,

I ran into an issue today where a node seemed to be lost, at 5am, a single query node stopped replicating (although logs and slon process showed things were healthy), but my primary DB kept storing entries until we hit over 7 million rows, because my query db was not "doing something" that something is a mystery. I finally dropped and re-added the node.   This has been running stable since we went to 9.3.4 and 2.2.3 slony. Nothing happened to the hardware, the query node was healthy, but seemingly not replicating (replication check , showed that the queried table was not  changing, when the other nodes saw the change).

So anyways, I didn't really know how to check to see what data was in sl_logs (other than querying them), nor did I know of any way to verify that data was moving out and being replicated (other than my repl check). I wanted to know if there was something out there that peeled the covers back on slon and sl_log to verify that things are replicating in a timely period, to know where the data in the sl_log is destined to and if a single host is holding up the show.

Just need more ways to check slons health, progress, backlog etc. Is there a front end somewhere that will let you see into the inner workings and states of Slony?

Thanks
Tory
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Glyn Astill | 29 Jan 18:01 2015
Picon

Slony 2.2.3 extended lag recovery

Hi All,

We're currently running slony 2.2.3 with 4 pg 9.0 nodes.  Occasionally since we upgraded from 2.1 I've been
seeing some "humps" where subscribers are lagging and taking an extended period of time to recover.

I can't ever reproduce it and I've come to a dead end.  I'm going to waffle a bit below, but I'm hoping someone
can see something I'm missing.

These humps appear to not really correlate with increased activity on the origin, and I've been struggling
to put my finger on anything aggravating the issue.  Today however I've seen the same symptoms, and the
start times of the lag align with an exclusive lock on a subscribers replicated table whilst vaccum full
was run.

Whilst I'd expect that to cause some lag and a bit of a backlog, the vacuum full took only 2 minutes and the lag
builds up gradually afterwards.  Eventually after a long time replication will catch up, but it's out of
proportion to our transaction rate, and a restart of the slon on the subscriber causes it to catch up very
swiftly.  I've attached a graph of sl_status from the origin showing the time and event lag buildup and a
pretty swift slice on the end of the humps where I restart the slons.

The graph (attached) shows nodes 4 & 5 starting to lag first, as they were the first to have the vacuum full
run, then node 7 starts to lag when it has the same vacuum full run (at this point the lag on the two other nodes
hadn't been noticed).  This excerpt from one of the subscribers shows the copy being blocked:

2015-01-29 10:09:54 GMT [13246]: [39-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
2015-01-29 10:09:54 GMT [13246]: [40-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  SQL statement "SELECT "_main_replication".logswitch_start()"
2015-01-29 10:12:04 GMT [13243]: [9-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  duration: 5089.684 ms 
statement: COPY "_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:19:05 GMT [13243]: [10-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  process 13243 still waiting
for RowExclusiveLock on relation 279233 of database 274556 after 1000.038 ms at character 13
2015-01-29 10:19:05 GMT [13243]: [11-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost QUERY:  UPDATE ONLY
"myschema"."table_being_full_vacuumed" SET "text" = $1 WHERE "address" = $2;
2015-01-29 10:19:05 GMT [13243]: [12-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost CONTEXT:  COPY sl_log_1, line 37: "8   
1108084090    2    1219750937    myschema    table_being_full_vacuumed    U    1    {text,"",address,some_address_data}"
2015-01-29 10:19:05 GMT [13243]: [13-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost STATEMENT:  COPY
"_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:20:43 GMT [13243]: [14-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  process 13243 acquired
RowExclusiveLock on relation 279233 of database 274556 after 98754.902 ms at character 13
2015-01-29 10:20:43 GMT [13243]: [15-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost QUERY:  UPDATE ONLY
"myschema"."table_being_full_vacuumed" SET "text" = $1 WHERE "address" = $2;
2015-01-29 10:20:43 GMT [13243]: [16-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost CONTEXT:  COPY sl_log_1, line 37: "8   
1108084090    2    1219750937    myschema    table_being_full_vacuumed    U    1    {text,"",address,some_address_data}"
2015-01-29 10:20:43 GMT [13243]: [17-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost STATEMENT:  COPY
"_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:20:43 GMT [13243]: [18-1]
app=slon.remoteWorkerThread_8,user=slony,db=X,host=somehost LOG:  duration: 98915.154 ms 
statement: COPY "_main_replication"."sl_log_1" ( log_origin,
log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype,
log_cmdupdncols,log_cmdargs) FROM STDIN
2015-01-29 10:22:00 GMT [13246]: [41-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: log switch to sl_log_1 complete - truncate sl_log_2
2015-01-29 10:22:00 GMT [13246]: [42-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  PL/pgSQL function "cleanupevent" line 94 at assignment
2015-01-29 10:34:01 GMT [13246]: [43-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
2015-01-29 10:34:01 GMT [13246]: [44-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  SQL statement "SELECT "_main_replication".logswitch_start()"
2015-01-29 10:46:08 GMT [13246]: [45-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
NOTICE:  Slony-I: could not lock sl_log_1 - sl_log_1 not truncated
2015-01-29 10:46:08 GMT [13246]: [46-1] app=slon.local_cleanup,user=slony,db=X,host=somehost
CONTEXT:  PL/pgSQL function "cleanupevent" line 94 at assignment

After this the copies go through cycles of increasing and decreacing duration, which I'm guessing is
something normal (perhaps syncs being grouped?), and I'm seeing messages stating "could not lock
sl_log_1 - sl_log_1 not truncated" a couple of times before the switch completes, and again I'm guessing
this is just blocking because of inserts capturing changes and is normal? Autovacuum hasn't hit sl_log at
all during this period.

Does anyone have any ideas?  I've debug logs from the slons, and postgres logs I can send off list if anyone has
any ideas.

Thanks
Glyn
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Stéphane Schildknecht | 27 Jan 19:31 2015
Picon

slony1-ctl 1.3.0 released

Hello,

The slony1-ctl development team is proud to announce version 1.3.0 of
slony1-ctl, a collection of shell scripts aiming at simplifying everyday
admnistration of a Slony replication.

This version adds no new feature but compatibility with slony 2.2.
The major change is a better use of variables and a deep comments cleaning.

The project homepage :
  http://pgfoundry.org/projects/slony1-ctl/

The package may be downloaded at :
  http://pgfoundry.org/frs/download.php/3838/slony1-ctl-REL1_3_0.tar.gz

Best regards,
--

-- 
Stéphane Schildknecht
Contact régional PostgreSQL pour l'Europe francophone
Loxodata - Conseil, expertise et formations

_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Dave Cramer | 21 Jan 15:24 2015
Picon

Lots of data in sl_log_? but sl_status shows lag of 0

How is this possible ?

Dave Cramer
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Steve Singer | 19 Jan 05:22 2015

Slony 2.2.4 released

The Slony team is pleased to announce Slony 2.2.4 the next minor release 
of the Slony 2.2.x series.

Slony 2.2.4 includes the following changes

  - Bug 352 :: Handle changes from PG HEAD ("9.5")
  - Bug 349 :: Issue with quoting of cluster name - only hit when 
processing DDL
  - Bug 350 :: Make cleanup_interval config parameter work as expected
  - Include alloca.h in slonik(fix for solaris)
  - Bug     :: 345 Fix bug when dropping multiple nodes at once
  - Bug 354 :: Fix race condition in FAILOVER
  - Bug 356 :: Perform TRUNCATE ONLY on replicas (when replicating a 
truncate)

Slony 2.2.4 can be downloaded from from the following URL

http://main.slony.info/downloads/2.2/source/slony1-2.2.4.tar.bz2
Sebastien Marchand | 7 Jan 11:15 2015
Picon

too much work for pg with 17 nodes...

Hi,

 

I’m using Slony 2.2.2 with postgresql 9.3.

Slony work well but i found slony work too on pg, too many access, too many process,…

For information slony work in wan with 17 nodes, and 18 replications.

Replication A

Server X -> server 1 ( schema base )

Server X -> server 2 ( schema base )

Server X -> server 3 ( schema base )

Server X -> server 4 ( schema base )

Server X -> server 17 ( schema base )

 

Replication B

 

Server 1 -> server X ( schema backbase1 )

Server 2 -> server X ( schema backbase2 )

Server 3 -> server X ( schema backbase3 )

Server 4 -> server X ( schema backbase4 )

Server 17 -> server X ( schema backbase17 )

 

Local.monitor was desactivated to decrease impact on pg.

I’m using this conf :

 

check interval 200

interval timeout 60000

group size  50

desired sync time 60000

cleanup cycles 0

log level 0

 

i don’t know if someone can help me but i would like to know if my organisation with several slon is the good way.

Better use one replication with 17 set ? perhaps…

 

Ps : sorry for mistake mail

 

Best regards,

Sébastien Marchand

Société SGO

 

 

 

 

Cordialement,

Sébastien Marchand

Société SGO

 

_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general
Sebastien Marchand | 7 Jan 11:05 2015
Picon

too much work !

Hi,

 

I’m using Slony 2.2.2 with postgresql 9.3.

Slony work well but i found slony work too on pg, too many access, too many process,…

For information slony work in wan with 17 nodes, and 18 replications.

Replication A

Server X -> server 1 ( schema base )

Server X -> server 2 ( schema base )

Server X -> server 3 ( schema base )

Server X -> server 4 ( schema base )

Server X -> server 17 ( schema base )

 

Replication B

 

Server 1 -> server X ( schema backbase1 )

Server 1 -> server X ( schema backbase1 )

 

 

Best regards,

Sébastien Marchand

Société SGO

 

_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general

Gmane