Willy-Bas Loos | 30 Sep 09:58 2014
Picon

synchronize DTAP

Hi,

We have an environment that has a central repository for lookups, which is replicated to several databases, ech for different applications.
This has been arranged in a DTAP manner.

Sometimes it is necessary to synchronize the lookups of one of the DTAP branches with another. But i can't just overwrite one database with a dump from another branch, as the consumer databases will not follow.
What i think i need is a way to compute the differences between two databases that have the same schema, and generate insert/update/delete statements from that.

Since this seems as a pretty generic problem, i thought that i should ask around before i start writing my own scripts. Does anyone know of script or application that does this?

Cheers
--
Willy-Bas Loos
pbj | 29 Sep 22:52 2014

BDR Global Sequences

I have a question about BDR Global Sequences.

I've been playing with BDR on PG 9.4beta2, built from source from the
2nd Quadrant GIT page (git://git.postgresql.org/git/2ndquadrant_bdr.git).

When trying a 1000000 row \copy-in, letting PG choose the global sequence
values, I get "ERROR:  could not find free sequence value for global
sequence public.myseq", as documented...no surprise there.  However, the
number of rows I can load before the error varies wildly with each trial.

Is there way to increase a global sequence's reservation block for each
node so that I can tell the nodes, "I'm going to load 100M rows now so
you should get ready for that."?

PJ

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andy Colson | 29 Sep 22:39 2014
Picon

PG 9.3 Switch streaming to wal shipping

Hi All.

I have a slave that was using streaming replication, and all was well, 
but its far away (remote data center), and its using too much bandwidth 
at peak times.

I'm switching it to wal shipping at night.  I don't really need it 
constantly up to date, nightly is good enough.

In my recovery.conf, I disabled the primary_conninfo, and have:

restore_command = 'cp /pub/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /pub/archive/ %r'

I restart PG, and all seems well.  It reached a consisten state and 
allows read-only queries.  Cool.  Except its spamming the log with lines 
like:
Sep 29 15:28:13 webserv postgres[17735]: [125-1] ,,2014-09-29 
15:28:13.329 CDT,: LOG:  restored log file "00000002000000B900000015" 
from archive
Sep 29 15:28:18 webserv postgres[17735]: [126-1] ,,2014-09-29 
15:28:18.358 CDT,: LOG:  restored log file "00000002000000B900000015" 
from archive
Sep 29 15:28:23 webserv postgres[17735]: [127-1] ,,2014-09-29 
15:28:23.387 CDT,: LOG:  restored log file "00000002000000B900000015" 
from archive

The answer to that, I thought was pg_standby.  So I changed:

restore_command = 'pg_standby -d /pub/archive %f %p  2>>/tmp/standby.log'

I restart PG, but now it never reaches consistency, wont allow queries, 
and /tmp/standbylog show's its trying to start with an old file:

Keep archive history: no cleanup required
running restore:      OK
Trigger file:         <not set>
Waiting for WAL file: 00000002000000B900000004
WAL file path:        /pub/archive/00000002000000B900000004
Restoring to:         pg_xlog/RECOVERYXLOG
Sleep interval:       5 seconds
Max wait interval:    0 forever
Command for restore:  cp "/pub/archive/00000002000000B900000004" 
"pg_xlog/RECOVERYXLOG"
Keep archive history: no cleanup required
WAL file not present yet.

00000002000000B900000004 doesn't exist, I've already moved past it. 
(Also, note, the first log I posted above accessing 
00000002000000B900000015 is much later after me messing around some more)

I understand why cp is spamming the log, I guess, but I don't understand 
why I cant just replace it with pg_standby.

Any hints on how to set this up?

Thanks for your time.

-Andy

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Abelard Hoffman | 29 Sep 04:00 2014
Picon

table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in viewing history and reverting changes when necessary (customer service makes an incorrect edit, etc.).

I have studied these two audit trigger examples:

I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the current version
2. have a separate versions table for each real table, and insert into the associated version table whenever an update or insert is done.

My current implementation is based on the wiki trigger examples, using a single table, and a json column to record the row changes (rather than hstore). What I like about that, in particular, is I can have a "global," chronological view of all versioned changes very easily.

But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record

#1 is simple to do. The versioning table has a user_id column of who made the change, so I can query on that.

#2 is more difficult. I may want to fetch all changes to a group of tables that are all related by foreign keys (e.g., find all changes to "user" record 849, along with any changes to their "articles," "photos," etc.). All of the data is in the json column, of course, but it seems like a pain to try and build a query on the json column that can fetch all those relationships (and if I mess it up, I probably won't generate any errors, since the json is so free-form).

So my question is, do you think using the json approach is wrong for this case? Does it seem better to have separate versioning tables associated with each real table? Or another approach?

Thanks


Mehdi Ravanbakhsh | 28 Sep 14:19 2014
Picon

call pl/pgsql function from main pl/pgsql function

Hi All

I have one main function in  pl/pgsql and one subset function in pl/pgsql .

i need to call  Subset Function From main function and i do not need to wait for subset Function return. I just need to send parameter to subset function and do not need any thing in return.

So i can call multiply subset function  repeatedly. 

Any one can help with this problem ?
snacktime | 28 Sep 01:48 2014
Picon

Postgres as key/value store

I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.

Just some quick background.  This design is for large scale games that can get up to 10K writes per second or more.  The storage will be behind a distributed memory cache that is built on top of Akka, and has a write behind caching mechanism to cut down on the number of writes when you have many updates in a short time period of the same key, which is common for a lot of multiplayer type games.

I have been using Couchbase, but this is an open source project, and Couchbase is basically a commercial product for all intents and purposes, which is problematic.  I will still support Couchbase, but I don't want it have to tell people if you really want to scale, couchbase is the only option.

The schema is that a key is a string, and the value is a string or binary.  I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json.  Json is useful at times especially for debugging.

This is my current schema:

CREATE TABLE entities
(
  id character varying(128) NOT NULL,
  value bytea,
  datatype smallint,
  CONSTRAINT entities_pkey PRIMARY KEY (id)
);

CREATE OR REPLACE RULE entities_merge AS
    ON INSERT TO entities
   WHERE (EXISTS ( SELECT 1
           FROM entities entities_1
          WHERE entities_1.id::text = new.id::text)) DO INSTEAD  UPDATE entities SET value = new.value, datatype = new.datatype
  WHERE entities.id::text = new.id::text;

Additional functionality I want is to do basic fuzzy searches by key.  Currently I'm using a left anchored LIKE query.  This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data.  These fuzzxy searches would never be used in game logic, they would be admin only queries for doing things like  obtaining a list of players.  So they should be infrequent.

The scope of the query ability will not expand in the future.  I support multiple backends for the key/value storage so I'm working with the lowest common denominator.  Plus I have a different approach for data that you need to do complex queries on (regular tables and an ORM).

 Chris
David G Johnston | 27 Sep 04:55 2014
Picon

Window function with valued based range frames?

ex.  avg(value) over (order by date range interval '6' day preceding)

Source:
http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/courses/SS2008/NEDM/RDDM.Chapter.06.Windows_and_Query_Functions_in_SQL.pdf

Page 9, slide 17

The SELECT SQL command page mentions standard aspects of window frame
clauses not being supported but is not specific (supposedly in order to
having people ask for these things).

Just looking for recollection regarding why these were omitted initially and
if anyone has concerned adding them in follow-up.  With the recent
hypothetical work being done maybe these can be re-evaluated in a fresh
light?  They (whatever they is) are standard and do seem generally useful.  

I don't personally have an immediate need but have been pondering moving
average related window queries and how performant they are in PostgreSQL
version possible alternative calculation means and came across this
presentation.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Window-function-with-valued-based-range-frames-tp5820757.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jonathan Vanasco | 26 Sep 23:02 2014

advice sought - general approaches to optimizing queries around "event streams"


I have a growing database with millions of rows that track resources against an event stream.  

i have a few handfuls of queries that interact with this stream in a variety of ways, and I have managed to drop
things down from 70s to 3.5s on full scans and offer .05s partial scans.  

no matter how i restructure queries, I can't seem to get around a few bottlenecks and I wanted to know if there
were any tips/tricks from the community on how to approach them.  

a simple form of my database would be:

	--  1k of
	create table stream (
		id int not null primary key,
	)

	-- 1MM of
	create table resource (
		id int not null primary key,
		col_a bool,
		col_b bool,
		col_c text,
	);

	-- 10MM of
	create table streamevent (
		id int not null,
		event_timestamp timestamp not null,
		stream_id int not null references stream(id)
	);

	-- 10MM of
	create table resource_2_stream_event(
		resource_id int not null references resource(id),
		streamevent_id int not null references streamevent(id)
	)

Everything is running off of indexes; there are no seq scans.

I've managed to optimize my queries by avoiding joins against tables, and turning the stream interaction
into a subquery or CTE.  
better performance has come from limiting the number of "stream events"  ( which are only the timestamp and
resource_id off a joined table ) 

The bottlenecks I've encountered have primarily been:

1.	When interacting with a stream, the ordering of event_timestamp and deduplicating of resources
becomes an issue.
	I've figured out a novel way to work with the most recent events, but distant events are troublesome

	using no limit, the query takes 3500 ms
	using a limit of 10000, the query takes 320ms
	using a limit of 1000, the query takes 20ms

	there is a dedicated index of on event_timestamp (desc) , and it is being used
	according to the planner... finding all the records is fine; merging-into and sorting the aggregate to
handle the deduplication of records in a stream seems to be the issue (either with DISTINCT or max+group_by)

2. 	I can't figure out an effective way to search for a term against an entire stream (using a tsquery/gin
based search)

	I thought about limiting the query by finding matching resources first, then locking it to an event
stream, but:
		- scanning the entire table for a term takes about 10 seconds on an initial hit.  subsequent queries for the
same terms end up using the cache, and complete within 20ms.

	I get better search performance by calculating the event stream, then searching it for matching
documents, but I still have the performance issues related to limiting the window of events

i didn't include example queries, because I'm more concerned with the general approaches and ideas behind
dealing with large data sets than i am with raw SQL right now.  
	
i'm hoping someone can enlighten me into looking at new ways to solve these problems.   i think i've learned
more about postgres/sql in the past 48hour than I have in the past 15 years, and I'm pretty sure that the
improvements I need will come from new ways of querying data , rather than optimizing the current queries.

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Nelson Green | 26 Sep 21:58 2014
Picon

password in recovery.conf

Hello all,

I am setting up a streaming replication stand-by, and the replication role password has a single quote in it. I am unable to properly reference the password in the conninfo setting of recovery.conf so it will authenticate to the master. Doubling the quote gives me a syntax error, and escaping it or quoting it with double-quotes gives me an authentication error. The password is correct because I can copy it from the recovery.conf and supply it when prompted by pg_basebackup, so if I may, what is the proper way to handle single quotes within the conninfo string?

Obviously I can change the password, but we use an automated password generator so I'd like to not have to keep generating passwords, and checking them, until I get one that will work, unless that my only option.

Thanks,
Nelson
Picon

PANIC: could not create file "pg_xlog/xlogtemp.7884": No space left on device

Hi,

 

I am using version 9.3.1 and see the “no space device error” even though there is enough space (99% free) on my disk.

 

Error: PANIC:  could not create file "pg_xlog/xlogtemp.7884": No space left on device

 

I want to know whether this is a bug and if yes whether fix or workaround is available.

I am using the default configuration mentioned in the postgresql.conf file.

 

bash-3.2# du -h

4.0K    ./pg_stat

6.2M    ./base/12026

198M    ./base/16384

6.3M    ./base/12031

6.3M    ./base/1

216M    ./base

24K     ./pg_stat_tmp

4.0K    ./pg_snapshots

4.0K    ./pg_serial

12K     ./pg_multixact/offsets

12K     ./pg_multixact/members

28K     ./pg_multixact

448K    ./global

4.0K    ./pg_xlog/archive_status

129M    ./pg_xlog

4.0K    ./pg_tblspc

12K     ./pg_notify

4.0K    ./pg_twophase

128K    ./pg_clog

4.0K    ./pg_subtrans

 

 

Paul Jungwirth | 26 Sep 17:47 2014

Procedure after failover

Hi All,

I have Postgres 9.3 on Ubuntu 14.04 set up in a master/slave
configuration with streaming replication. On the master I ran `sudo
service postgresql stop` and then on the slave I ran `sudo touch
$trigger_file`. Now the slave seems to be running fine, but I'm trying
to figure out the process for getting things back to normal. I think
it is roughly like this, but I'd love for someone to confirm:

1. Change each server's {postgresql,recovery}.conf so the (old) slave
will replicate back to the (old) master. Restart the (old) slave, then
start the (old) master.
2. Once the (old) master has caught up, run `sudo service postgresql
stop` on the (old) slave, then `sudo touch $trigger_file` on the (old)
master. Now the (old) master is a master again.
3. Change each server's {postgresql,recovery}.conf files to their
original settings. Restart the master, then start the slave.

Will this work?

What if there were changes on the master that didn't get replicated
before I originally shut it down? (Or does using init.d delay shutdown
until all WAL updates have made it out?)

Is there a better way to do it? Do I need to wipe the (old) master and
use pg_dump/pg_restore before I bring it back up?

If it helps, here is my postgresql.conf on the master:

archive_mode = on
archive_command = 'rsync -aq -e "ssh -o StrictHostKeyChecking=no" %p
10.0.21.10:/secure/pgsql/archive/%f'
archive_timeout = 3600

Here is postgresql.conf on the slave:

hot_standby = on

and recovery.conf on the slave:

standby_mode = 'on'
primary_conninfo = 'XXXXXXX'
trigger_file = '/secure/pgsql/main/trigger'
restore_command = 'cp /secure/pgsql/archive/%f %p'
archive_cleanup_command =
'/usr/lib/postgresql/9.3/bin/pg_archivecleanup /secure/pgsql/archive/
%r'

Thanks,
Paul

-- 
_________________________________
Pulchritudo splendor veritatis.

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Gmane