dandl | 5 May 03:47 2016
Gravatar

CREATE OR REPLACE AGGREGATE -- NOT!

I was surprised to discover that

·         CREATE OR REPLACE AGGREGATE is not allowed, but

·         CREATE OR REPLACE FUNCTION works just fine

 

Is this an oversight, or is there some reason I missed?

 

Yes, I know I can do:

DROP AGGREGATE IF EXISTS

CREATE AGGREGATE

 

But this triggers a NOTICE. Is there any convenient way to replace a (possibly) existing aggregate function without triggering a NOTICE?

 

Regards

David M Bennett FACS

Andl - A New Database Language - andl.org

 

Kurt Roeckx | 5 May 00:22 2016
Picon

Very slow update / hash join

Hi,

I have an update query that's been running for 48 hours now.
Since it started it used about 2.5% CPU, and is writing to the
disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
waiting for the disks.

The query plan looks like this:
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Update on certificates c  (cost=1224052.45..60710389.31 rows=19950420 width=1371)
   ->  Hash Join  (cost=1224052.45..60710389.31 rows=19950420 width=1371)
         Hash Cond: (c.id = cu.id)
         ->  Seq Scan on certificates c  (cost=0.00..8372137.31 rows=147868231 width=1258)
         ->  Hash  (cost=623981.20..623981.20 rows=19950420 width=117)
               ->  Seq Scan on certificates_update cu  (cost=0.00..623981.20 rows=19950420 width=117)

I've set the work_mem for this query to 6GB, which seem to be
enough to make pgsql_tmp empty, when it was only set to 1 GB it
did have files in it.  The process is using about 4GB of RAM, of
which 0.5 probably comes from the shared_buffers.

It did use 100% CPU at the start, but that was for about 80
seconds.  I'm guessing that's the time it needs to read and hash
the update table.  But from that point on, it gets really slow.

As you can see, the table is quite large and I want to update
about 20M rows of the 133M rows (not sure why the plan say 147M)

The table itself is 53GB, and the table it updates from is only
3.3 GB.  There are some index on some of the fields (like the id),
but none of them are being updated.  I tried removing those that
did get updated but that had little effect.  It does have foreign
keys to other tables, and other tables references it, but none of
the keys should get updated.

Reading or writing the whole table shouldn't take that long, and I
have no idea why it's this slow.  Does anybody have an idea why
it's this slow?

From what I understand, the hash join should be the one I want to
use, I tried to force the others but that doesn't seem to improve
anything.

Kurt

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

rob stone | 4 May 22:55 2016
Picon

Debian and Postgres

Hello,

This is a Debian problem that has caused a problem starting Postgres.
So, I'm posting it here in the hope that somebody has experienced it
previously and can assist me in fixing it.

Yesterday, it started up as per normal and first command issued via
JDBC driver was run:-

2016-05-04 10:39:39 AESTLOG:  MultiXact member wraparound protections
are now enabled
2016-05-04 10:39:39 AESTLOG:  database system is ready to accept
connections
2016-05-04 10:39:39 AESTLOG:  autovacuum launcher started
2016-05-04 10:40:00 AESTLOG:  duration: 224.906 ms  parse <unnamed>:
SET extra_float_digits = 3

Today, the following was in the log:-

2016-05-05 03:44:53 AESTLOG:  test message did not get through on
socket for statistics collector
2016-05-05 03:44:53 AESTLOG:  disabling statistics collector for lack
of working socket
2016-05-05 03:44:53 AESTWARNING:  autovacuum not started because of
misconfiguration
2016-05-05 03:44:53 AESTHINT:  Enable the "track_counts" option.
2016-05-05 03:44:53 AESTLOG:  database system was shut down at 2016-05-
04 11:56:37 AEST
2016-05-05 03:44:54 AESTLOG:  MultiXact member wraparound protections
are now enabled
2016-05-05 03:44:54 AESTLOG:  database system is ready to accept
connections
2016-05-05 03:58:29 AESTLOG:  duration: 787.241 ms  statement: select
count(*) from boiler_plate;

Relative section of conf file unchanged for months:-

# - Query/Index Statistics Collector -

#track_activities = on
track_counts = on
#track_io_timing = off
#track_functions = none                 # none, pl, all
#track_activity_query_size = 1024       # (change requires restart)
stats_temp_directory = 'pg_stat_tmp'

# - Statistics Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off

I can connect via psql and issue queries without any problems. Trying
to connect via JDBC fails. Trying to connect by an application fails.

I use synaptic for package management. Yesterday I was running short on
space and removed some packages described as *obsolete* by synaptic.
My guess is that a dependency chain is incorrect and a certain package
is not in fact obsolete.
For some reason the synaptic history log only shows tzdata-java as
being removed but in fact several libraries and other packages were
permanently removed. The dpkg logs only show installation details.
I don't know why synaptic failed to record the removals in its history
logs as if it had, then I could trawl thru the logs and restore the
missing packages.
The apt logs do not list removals either.

Versions:-

Java JDK version 1.8.0_91
Linux [ 4.5.0-1-amd64 ]
JDBC 9.4-1208
Postgres psql (9.5.2)
libpq5_9.5.2-1_amd64.deb

Any assistance appreciated.

Thanks,
Rob

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Guyren Howe | 4 May 18:43 2016
Picon

Debugging code on server?

This is a fork from my "Love your database" question. It's a separable concern, so I moved it here.

Let's say I want to use Postgres' *amazing* support for lots of languages. I want to use Javascript or PERL or
Ruby or something. How do I debug the code?

I can imagine things you might set up: using LISTEN/NOTIFY to a simple logger that would let you see what's
going on. There are breakpoints and such in PGAdmin, but I'll be that doesn't work for other languages.

Any thoughts?

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Will McCormick | 4 May 18:14 2016
Picon

Re: Thoughts on "Love Your Database"

I agree it's typically political but so are most things business. Examples:  Companies buy other companies - You are using a competitors data store and want to replace it.  Company needs to compete with competitors and wants to reduce cost ... these are not technical requirements and it's certainly not vapor ideology. I have only worked for startups and have seen this happen at every company i have worked for, yes it is political but yes it happens. Smaller companies are more susceptible to it. 

The reality is somewhere in the middle as it often is. My point is you don't have to replace a million lines of code if you plan upfront. If you don't .. you do.


On Wed, May 4, 2016 at 11:29 AM, Uwe Schroeder <uwe <at> oss4u.com> wrote:

On Wed, May 04, 2016 11:05:25 AM Will McCormick wrote:

A reason to consider may be portability. What happens if I want to let my customer chose their data store or I just don't want to put all my eggs in one basket.Technically there are truths but you cannot ignore the business side either. If a we can exceed our performance requirements and keep things generic/portable this is the best of both worlds.I think this is the main reason people separate the business logic from the database. How many of you have ported databases between platforms? Or had multiple types of data stores in the same company?

I have been waiting for the portability argument for the last 20+ posts :-) Everyone who did any type of consulting/working in this field knows that latest the second argument from management is “portability”. I have yet to see anyone who really needed to move to a different database. If they did it usually is a political issue and not a technical one (uhh, we hired this new operations manager and MS SQL is so much better than postgresql …) Unless you develop a generic software to be sold to many clients, the choice of data storage is rarely a real concern unless someone who feels the urge to be important starts throwing words he read in a magazine at management. None of my clients ever questioned the database. They either had one which they wanted to use (i.e. they had a big iron IBM with DB2 on it plus a DBA who knew what he was doing) or they didn't care as long as it worked as expected and came with a maintenance contract of sorts.

If you're developing a web application, the least concern is portability (you're not going to replace the million lines of PHP either). The biggest concern should lie with data integrity and API's that protect your data and business logic so you can hire that cheap app development company from overseas your boss recommended (based on a spam email) without running the risk of compromising everything.

Uwe


Klaus P. | 4 May 10:28 2016
Picon

Do parallel queries work with only dblink not with fdw?

In order to improve cpu and disk utilization, I am testing parallel queries.

The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.

My approach is fairly easy:

db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');

...

SELECT
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')

...


However, using fdw foreign data wrappers, I was not able to run any query in parallel. I came across this presentation http://de.slideshare.net/EnterpriseDB/foreign-data-wrappers-and-you-with-postgres where it says on page 12  "parallel FDW access". Is there any example available on how to run FDW queries in parallel?

My approach with FDW is similar to the dblink example above:

SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
UNION ALL
SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
...

My experience is however that the statements are carried out sequentially.

Thanks

Klaus Pieper
Thomas Kellerer | 4 May 08:46 2016
Picon
Picon

MVIEW refresh consistently faster then insert ... select

Hello,

I have a table that is an aggregation of another table. 
This aggregation reduces an input of ~14 million rows to ~4 million rows. 

So far I have used a truncate/insert approach for this: 

    truncate table stock;
    insert into stock (product_id, warehouse_id, reserved_provisional, reserved, available, non_deliverable)
    select product_id, warehouse_id, sum(reserved_provisional), sum(reserved), sum(available), sum(non_deliverable)
    from staging.stock_data
    group by product_id, warehouse_id;

The table has a primary key on (product_id, warehouse_id), all columns are integer columns.
The refresh takes approx 2 minutes (fastest was 1:40) on our development server (CentOS, Postgres 9.5.0) 

However, when I create a materialized view: 

    create materialized view mv_stock 
    as
    select product_id, 
           warehouse_id, 
           sum(reserved_provisional) as reserved_provisional, 
           sum(reserved) as reserved, 
           sum(available) as available, 
           sum(non_deliverable) as non_deliverable
    from staging.stock_data
    group by product_id, warehouse_id;

    create unique index pk_mv_stock on mv_stock (product_id, warehouse_id);

Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock" are consistently much faster: between
40 seconds and 1 minute

I have run both refreshs about 10 times now, so caching effects should not be there. 

My question is: what is refresh mview doing differently then a plain insert ... select that it makes that so
much faster? 

The select itself takes about 25 seconds. It is using an external merge on disk, which can be removed by
increasing work_mem (the select then goes down to 12 seconds, but that doesn't change much in the runtime
of the refreshs). 

The 2 minutes are perfectly acceptable, I'm just curious why refreshing the mview would be so much faster as
the work they are doing should be exactly the same. 

Thomas

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Guyren Howe | 4 May 06:11 2016
Picon

Thoughts on "Love Your Database"

I've long been frustrated with how most web developers I meet have no idea how to use an SQL database
properly. I think I'm going to write a book called Love Your Database, aimed at web developers, that
explains how to make their apps better by leveraging the power of SQL in general, and Postgres in particular.

I'm thinking of a section on features of SQL most folks don't know about (CTEs are *way* to hell at the top of
that list, but also EXCEPT/INTERSECT and window functions), but much of the book would be about how to do
things server side. Benchmarks showing how much faster this can be, but mostly techniques — stored
procedures/triggers/rules, views.

I asked a colleague about the advice I often hear stated but seldom justified, that one shouldn't put
business rules in the database. He offered that server-side code can be hard to debug.

I'm sure many here would love to see such a book published, maybe some talks on the topic given.

What might I cover that I haven't mentioned? What are the usual objections to server-side code and how can
they be met? When *are* they justified and what should the criteria be to put code in Postgres? Any other
thoughts? Any other websites or books on the topic I might consult?

TIA

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Mike Sofen | 4 May 04:06 2016

Re: Function PostgreSQL 9.2

From: David G. Johnston  Sent: Tuesday, May 03, 2016 2:46 PM
To: drum.lucas <at> gmail.com

The only other reasonable option is change your model and requirements to something less complex.

 

 

Seriously, get yourself the books I have recommended and study them  BEFORE you continue attempting to design your database.

You need a firm understanding of logical design & flow, otherwise you will be wasting your time. 

That's what I'm doing.. Studying.. asking for some help to get a better understand.... isn't this the purpose of this mail list?

 

​The purpose of this list is whatever people are willing to make of it - within reason.  I share the sentiment that your particular method of education is becoming burdensome to the people who volunteer their time on these lists to answer questions.  As the alternative is to simply stop replying to your emails be grateful that someone was at least willing to tell you to try other avenues of education.

 

David J.

 

+1   +1   +1   +1:  Lucas – David really nails it: you’ve stopped relying on yourself and are relying on the mercy and generosity of strangers to solve your problems.  It takes a lot of time to read, dissect, understand and then comment on your (or anyone’s) emails, and you’ve proven adept at consuming much of the available oxygen on the list.  Please a bit more considerate - you’ve received an amazing amount of solid guidance and advice.  I can tell you that when I see your name as the author, I now normally delete the email – that’s NOT the response you want, right?

 

I’ve worked with a bunch of junior developers over the years, some of whom hit a wall and just sit there, waiting for someone to fix them.  Those junior developers “never” become senior developers because their peers soon lose interest in collaborating with them, if you catch my drift…

 

Mike Sofen

john.tiger | 4 May 02:07 2016
Picon

ruby pg connection fails on centos - okay on debian and dev machine

our model.rb runs fine on a dev machine and a debian server  but is 
failing on a new centos server - checked the postgres db name and user 
name and password - all seem fine

could it be:

host => "localhost"

or maybe CORS ?

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Steve Clark | 3 May 21:48 2016

Get sum of sums

Hi List,

I have the following table that has netflow data. I can get the top ten receivers by the query below - but I would also like to get
at the same time a grand total of the RX Bytes. I am not exactly sure how to do it. This is with version 8.4.

select ip_dst as "Receiver" ,sum(bytes) as "RX Bytes" from acct_v9 where stamp_inserted >= '2016-04-26' and stamp_inserted <= '2016-04-30' and tag=246 group by ip_dst  order by "RX Bytes" desc limit 10;
    Receiver    |  RX Bytes  
----------------+-------------
 172.24.110.93  | 40363536915
 172.24.110.81  |  6496041533
 172.24.110.123 |  4891514009
 172.24.16.10   |  4540333516
 172.24.110.151 |  4101253631
 192.168.198.71 |  3303066724
 172.24.110.121 |  2529532947
 172.24.110.101 |  2506527294
 172.21.64.99   |  2472842640
 172.24.110.83  |  2232550271


Thanks,
Steve
--


Gmane