Karl Denninger | 1 Aug 2010 06:19
Gravatar

Aieeee! Problem with 2.0.4

I upgraded somewhat-recently from 2.0.2 to 2.0.4, and now I've got a
serious problem.

The reason for the "gotta do it now" was that somehow one of the tables
got out of sync, and a delete was failing to propagate - hanging the
process.

OK, ok, so 2.0.2 with Postgres 8.4.4 is a bit old and mismatched.  So I
upgraded to 2.0.4 on all the nodes, and told the subscriber to reload -
ditched the client config and re-subscribed the sets.

All went well until a very large table came up - it failed.

There's no error in the logs indicating why, other than the following:

Jul 31 22:52:53 dbms TICKER[70295]: [153-1] CONFIG remoteWorkerThread_3:
copy table "public"."images"
Jul 31 22:52:53 dbms TICKER[70295]: [154-1] CONFIG remoteWorkerThread_3:
Begin COPY of table "public"."images"
Jul 31 22:54:24 dbms TICKER[70295]: [155-1] ERROR  remoteWorkerThread_3:
PGgetCopyData() server closed the connection unexpectedly
Jul 31 22:54:24 dbms TICKER[70295]: [155-2]     This probably means the
server terminated abnormally
Jul 31 22:54:24 dbms TICKER[70295]: [155-3]     before or while
processing the request.
Jul 31 22:54:24 dbms TICKER[70295]: [156-1] WARN   remoteWorkerThread_3:
data copy for set 1 failed 1 times - sleep 15 seconds

And in 15 seconds, the entire process of trying to re-init the node
starts over - from the beginning!
(Continue reading)

Karl Denninger | 1 Aug 2010 08:00
Gravatar

Re: Aieeee! Problem with 2.0.4

Karl Denninger wrote:
> I upgraded somewhat-recently from 2.0.2 to 2.0.4, and now I've got a
> serious problem.
>
> The reason for the "gotta do it now" was that somehow one of the tables
> got out of sync, and a delete was failing to propagate - hanging the
> process.
>
> OK, ok, so 2.0.2 with Postgres 8.4.4 is a bit old and mismatched.  So I
> upgraded to 2.0.4 on all the nodes, and told the subscriber to reload -
> ditched the client config and re-subscribed the sets.
>
> All went well until a very large table came up - it failed.
>
> There's no error in the logs indicating why, other than the following:
>
> Jul 31 22:52:53 dbms TICKER[70295]: [153-1] CONFIG remoteWorkerThread_3:
> copy table "public"."images"
> Jul 31 22:52:53 dbms TICKER[70295]: [154-1] CONFIG remoteWorkerThread_3:
> Begin COPY of table "public"."images"
> Jul 31 22:54:24 dbms TICKER[70295]: [155-1] ERROR  remoteWorkerThread_3:
> PGgetCopyData() server closed the connection unexpectedly
> Jul 31 22:54:24 dbms TICKER[70295]: [155-2]     This probably means the
> server terminated abnormally
> Jul 31 22:54:24 dbms TICKER[70295]: [155-3]     before or while
> processing the request.
> Jul 31 22:54:24 dbms TICKER[70295]: [156-1] WARN   remoteWorkerThread_3:
> data copy for set 1 failed 1 times - sleep 15 seconds
>
> And in 15 seconds, the entire process of trying to re-init the node
(Continue reading)

Karl Denninger | 1 Aug 2010 19:55
Gravatar

Eek Part II - No, SSL was not the entire problem (Replication COPY fails repeatedly)

So last night, if you remember my previous missive, I thought I had
found the issue with a big table copy having to do with Postgresql's SQL
support and went to bed with it running with SSL off.

Well, I was wrong, as I was treated to this morning after the thing ran
for more than four hours - probably about the amount of time required to
actually complete the job.  (It actually failed TWICE and restarted
overnight.)

Aug  1 06:32:25 dbms TICKER[77422]: [153-1] CONFIG remoteWorkerThread_3:
copy table "public"."images"
Aug  1 06:32:25 dbms TICKER[77422]: [154-1] CONFIG remoteWorkerThread_3:
Begin COPY of table "public"."images"
Aug  1 10:09:08 dbms TICKER[77422]: [155-1] ERROR  remoteWorkerThread_3:
copy from stdin on local node - PGRES_FATAL_ERROR server closed the
connection unexpectedly
Aug  1 10:09:08 dbms TICKER[77422]: [155-2]     This probably means the
server terminated abnormally
Aug  1 10:09:08 dbms TICKER[77422]: [155-3]     before or while
processing the request.
Aug  1 10:09:08 dbms TICKER[77422]: [156-1] WARN   remoteWorkerThread_3:
data copy for set 1 failed 1 times - sleep 15 seconds
Aug  1 10:09:08 dbms TICKER[77422]: [157] ERROR  remoteWorkerThread_3:
"rollback transaction" PGRES_FATAL_ERROR
Aug  1 10:09:08 dbms TICKER[72097]: [5-1] INFO   slon: retry requested
Aug  1 10:09:08 dbms TICKER[72097]: [6-1] INFO   slon: notify worker
process to shutdown

The problem is that I really don't have anything untoward in the
postgres log file this time, except for:
(Continue reading)

Neetin Kumar | 3 Aug 2010 08:24

Reg:- Database size in postgres

Hi,

I am new to postgres My database actual size is 2 GB after replication 
its become 47 GB i tried to run VACUUM but i am not able to do so 
because getting error :No space left on disk.

Please suggest me what to do.

Can I remove pg_xlog files its size become 2.4 GB.

If yes how can i remove the log files.

Thanks,
Neetin
Stéphane A. Schildknecht | 3 Aug 2010 15:11
Picon

Re: Eek Part II - No, SSL was not the entire problem (Replication COPY fails repeatedly)


Le 01/08/2010 19:55, Karl Denninger a écrit :
> So last night, if you remember my previous missive, I thought I had
> found the issue with a big table copy having to do with Postgresql's SQL
> support and went to bed with it running with SSL off.
> 
> Well, I was wrong, as I was treated to this morning after the thing ran
> for more than four hours - probably about the amount of time required to
> actually complete the job.  (It actually failed TWICE and restarted
> overnight.)
> 
> Aug  1 06:32:25 dbms TICKER[77422]: [153-1] CONFIG remoteWorkerThread_3:
> copy table "public"."images"
> Aug  1 06:32:25 dbms TICKER[77422]: [154-1] CONFIG remoteWorkerThread_3:
> Begin COPY of table "public"."images"
> Aug  1 10:09:08 dbms TICKER[77422]: [155-1] ERROR  remoteWorkerThread_3:
> copy from stdin on local node - PGRES_FATAL_ERROR server closed the
> connection unexpectedly
> Aug  1 10:09:08 dbms TICKER[77422]: [155-2]     This probably means the
> server terminated abnormally
> Aug  1 10:09:08 dbms TICKER[77422]: [155-3]     before or while
> processing the request.
> Aug  1 10:09:08 dbms TICKER[77422]: [156-1] WARN   remoteWorkerThread_3:
> data copy for set 1 failed 1 times - sleep 15 seconds
> Aug  1 10:09:08 dbms TICKER[77422]: [157] ERROR  remoteWorkerThread_3:
> "rollback transaction" PGRES_FATAL_ERROR
> Aug  1 10:09:08 dbms TICKER[72097]: [5-1] INFO   slon: retry requested
> Aug  1 10:09:08 dbms TICKER[72097]: [6-1] INFO   slon: notify worker
> process to shutdown
> 
(Continue reading)

Steve Singer | 3 Aug 2010 15:15

Re: [slony-general] moving tables to another schema

Jaime Casanova wrote:
> Hi,
> 
> I want to move some replicated tables to a new schema, so i execute
> "ALTER TABLE SET SCHEMA new_schema" via the SLONIK EXECUTE SCRIPT
> command and the result was:
> - on origin: everything is ok, table moved and sl_table fixed
> - on subscriber: table was moved but sl_table still says that the
> table is in public (i guess i can update sl_table manually but i
> prefer it happens automagically)
> 
> this was on pg 8.4 with slony 1.2.20
> 
> I also tried to rename the schema via the SLONIK EXECUTE SCRIPT
> command and on subscriber the schema wasn't renamed.

I think this is a bug.

The function _ <at> CLUSTERNAME <at> .updateRelname adjustst the sl_table to 
reflect the name in the catalog.  It is being called by 
ddlScript_complete() which only runs on the event node of an execute 
script.  I think we really want to call this function from 
ddlScript_complete_int() that gets called on both nodes.

You could call updateRelname directly but I don't see how that is any 
better than doing the update yourself.

--

-- 
Steve Singer
Afilias Canada
(Continue reading)

Andrew Sullivan | 3 Aug 2010 17:22
Picon

Re: Reg:- Database size in postgres

On Tue, Aug 03, 2010 at 11:54:56AM +0530, Neetin Kumar wrote:

> I am new to postgres My database actual size is 2 GB after replication 
> its become 47 GB i tried to run VACUUM but i am not able to do so 
> because getting error :No space left on disk.

You need more disk.  You can solve this by shutting postgres down,
moving some of the files to another filesystem, and symlinking them
back into their previous location.  Also, look for log files
(human-readable log files) or something else that you can safely cut
down to get yourself more room.

> Can I remove pg_xlog files its size become 2.4 GB.

No, lordy, no.  That's the transaction log.  If you remove it you will
destroy the database.

A big question is how your database got so big.  Are you quite sure
you're not getting a lot of errors (which leave dead tuples around)?

A
--

-- 
Andrew Sullivan
ajs@...
Christopher Browne | 4 Aug 2010 20:17

Re: slon memory usage

Steve Singer <ssinger@...> writes:
> I see a few choices
>
> 1) We can have slon explicitly request a more reasonable thread stack 
> size before it creates threads (pthreads has API calls for this).  I'm 
> not 100% sure what the best value for this would be for all platforms 
> but slon tends to be pretty good about not storing large values on the stack
>
> 2) We can document this and tell sysadmins/DBA's that are concerned 
> about the slon memory footprint to use their OS's facilities (ie ulimit 
> -s before invoking slon) to adjust how much stack each thread.
>
> The argument for 1 is that the slony development team has a better sense 
> of how much stack memory slon threads take, and we might even discover 
> that different threads types of stack memory requirements (I doubt the 
> different slony thread types will very much in stack usage)
>
> The argument for 2 is that the OS is already providing facilities to 
> tune this and we should leave it tunable through those.
>
> Thoughts?

I agree with your reasonings for both 1 and 2...

I'll throw in another consideration that generally favours 2.

There are folks that run slon under Windows, which may have
substantially different behaviour.  I don't know how memory tuning works
there, and we might easily do a Unix-centric thing here that would break
Windows porting.
(Continue reading)

Christopher Browne | 4 Aug 2010 20:24

Re: Aieeee! Problem with 2.0.4

Karl Denninger <karl@...> writes:
> This isn't SLONY's issue, but it's definitely a problem.  I'll report it
> over on the Postgres list in the morning...

I'll note this in the FAQ.
--

-- 
let name="cbbrowne" and tld="ca.afilias.info" in String.concat " <at> " [name;tld];;
Christopher Browne
"Bother,"  said Pooh,  "Eeyore, ready  two photon  torpedoes  and lock
phasers on the Heffalump, Piglet, meet me in transporter room three"
raghu ram | 5 Aug 2010 12:31
Picon

WARN :: remoteWorker_event: event xxxx ignored - unknown origin --- slony log's

Hi Slony guru's,



We had a replication setup like 

                                Master --> slave 
                                                 master --> slave

Slony log's saying below warings:

2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,334 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,335 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,336 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,337 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,338 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,339 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,340 ignored - unknown origin
2010-08-05 03:01:39 PDT WARN   remoteWorker_event: event 2,341 ignored - unknown origin

As per slony documentation say's that... "Probably happens if events arrive before the STORE_NODE event that tells that the new node now exists"..


Could you please let me know, How to avoid those problems in the replication setup


Regards
Raghu
_______________________________________________
Slony1-general mailing list
Slony1-general@...
http://lists.slony.info/mailman/listinfo/slony1-general

Gmane