Yanrui Hu | 18 Nov 09:33 2014

About the tps explanation of pgbench, please help

I am working on a evaluation to put db client outside the datacenter and to know how the network impact on the business.
After several round of testing, I have a question regarding to the two tps result in stress output.

Test A:
Client and DB server exist in same AWS datacenter.
transaction type: Custom query
scaling factor: 500
query mode: simple
number of clients: 25
number of threads: 25
duration: 600 s
number of transactions actually processed: 54502
tps = 90.814930 (including connections establishing)
tps = 204.574432 (excluding connections establishing)

Test B:
Client and DB server exist in different AWS datacenter (west and east).
transaction type: Custom query
scaling factor: 500
query mode: simple
number of clients: 25
number of threads: 25
duration: 600 s
number of transactions actually processed: 13966
tps = 23.235705 (including connections establishing)
tps = 42.915990 (excluding connections establishing)

Its obviously that both tps become lower if client and server do not exist in same datacetner since the network connection have more latency.
But I can not explain why the tps that excluding connections establishing is changed so much.
For my understanding, tps excluding connections establishing get rid of the time that create socket cost. That means in above two test cases(only network different), the tps excluding connections establishing should be very close, right? Because the database is same and capability is same only network latency is different.


--
Best Regards,

Yanrui Hu (Ray)
Roger Pack | 18 Nov 00:18 2014
Picon

count distinct slow?

Hello.

As a note, I ran into the following today (doing a select distinct is fast, doing a count distinct is significantly slower?)

assume a table "issue" with a COLUMN nodename character varying(64);, 7.5M rows...

select distinct  substring(nodename from 1 for 9) from issue;

-- 5.8s

select count(distinct substring(nodename from 1 for 9)) from issue;

-- 190s


SELECT COUNT(*) FROM (SELECT DISTINCT substring(nodename from 1 for 9) FROM issue) as temp; 

-- 5.5s

I have an index on nodename's substring 1 for 9.

It struck me as odd that a count distinct would be far slower than selecting distinct rows themselves.  Apparently there are other workarounds people have come up with as well [1].  Just mentioning in case it's helpful.
Cheers!
-roger-


explains:

explain analyze select count(distinct substring(nodename from 1 for 9)) from issue;

Aggregate  (cost=222791.77..222791.78 rows=1 width=16) (actual time=190641.069..190641.071 rows=1 loops=1)
  ->  Seq Scan on issue  (cost=0.00..185321.51 rows=7494051 width=16) (actual time=0.016..3487.694 rows=7495551 loops=1)
Total runtime: 190641.182 ms



explain analyze select distinct  substring(nodename from 1 for 9) from issue;

HashAggregate  (cost=222791.77..222846.45 rows=4375 width=16) (actual time=6276.578..6278.004 rows=6192 loops=1)
  ->  Seq Scan on issue  (cost=0.00..204056.64 rows=7494051 width=16) (actual time=0.058..4293.976 rows=7495551 loops=1)
Total runtime: 6278.564 ms


explain analyze SELECT COUNT(*) FROM (SELECT DISTINCT substring(nodename from 1 for 9) FROM issue) as temp; 

Aggregate  (cost=222901.14..222901.15 rows=1 width=0) (actual time=5195.025..5195.025 rows=1 loops=1)
  ->  HashAggregate  (cost=222791.77..222846.45 rows=4375 width=16) (actual time=5193.121..5194.454 rows=6192 loops=1)
        ->  Seq Scan on issue  (cost=0.00..204056.64 rows=7494051 width=16) (actual time=0.035..3402.834 rows=7495551 loops=1)
Total runtime: 5195.160 ms

Can't drop a view ("view does not exist") that has an entry in "INFORMATION_SCHEMA.views"

Hi guys,
I need to drop a view in order  to alter a type of a column: numeric(12,1) ->  numeric(12,2):

ERROR:  cannot alter type of a column used by a view or rule DETAIL:  rule _RETURN on view "TransTasksCube" depends on column "billable_units"

Trying to delete it, I get:

projop=# drop view TransTasksCube;
ERROR:  view "transtaskscube" does not exist

FYI: VIEW also shows up in table INFORMATION_SCHEMA.views:

projop=# select table_name from INFORMATION_SCHEMA.views where table_name like '%Trans%';
   table_name
----------------
 TransTasksCube
(1 row)

Appreciate your help!
Klaus

Robert DiFalco | 17 Nov 18:55 2014
Picon

String searching

I notice there are several modules to create specialized indices in PostgreSQL for searching VARCHAR data.  For example, fuzzy, trigram, full text, etc.

I've been googling around but I can't find the optimal method (reasonable speed and size, simplicity) for my use case. 

My text searches will always be like the following. User specifies a word (e.g. "John") and I have a field called "FullName" that could return records with "John Doe", "Robert Johnson", "Joe Johnson Smith", etc. I may also extend the search criteria to other fields. So for example the query would always look like this:

    SELECT * FROM MyTable WHERE upper(FullName) LIKE upper('%John%');

So you see it is also case insensitive. Pretty simple/standard stuff.

That said, which would be the best extension module to use? A "gist" index on the uppercased column? Or something else? Thanks!
Rajesh K | 13 Nov 06:00 2014
Picon

Client Replication Login problem.

Hi,

I have a very basic question about  PostgreSQL replication.The Question is that after replication on Client Server ,not able to login as postgres user on client .(Postgres replication manuals are got from Official Website).But no issue on login to server.Answers are valuable.


Thanks $ Regards

Rajesh.K
8129823272



VENKTESH GUTTEDAR | 17 Nov 12:23 2014
Picon

[general] Encrypting/Decryption

Hello,

    As i am new to postgresql and django. please help me to acheive the following.

    I have created a database with some tables and i want to encrypt the data in database tables, so please someone guide me how to do it.?

    i want to encrypt the data and also decrypt through django, so tell me which is right way either to use pgcrypto or pycryto,
    i am not understanding neither of them. please help.

--
Regards :
Venktesh Guttedar.

VENKTESH GUTTEDAR | 17 Nov 12:00 2014
Picon

ERROR: Corrupt ascii-armor

Hello,

    I am running PostgreSQL 9.3.5 on Ubuntu 14.04. i want to encrpty data while storing and decrypt while retrieving it. so i am using pgcrypto to achieve it. i have genrated the keys using gpg --gen-key.
and i have exported them to also.

I have created the table this way :

CREATE TABLE testuserscards(card_id SERIAL PRIMARY KEY, username varchar(100), cc bytea);

and when i am trying to insert data :

INSERT INTO testuserscards(username, cc)SELECT robotccs.username, pgp_pub_encrypt(robotccs.cc, keys.pubkey) As cc
FROM (VALUES ('robby', '41111111111111111'),
    ('artoo', '41111111111111112') ) As robotccs(username, cc)
     CROSS JOIN (SELECT dearmor('-----BEGIN PGP PUBLIC KEY BLOCK-----
super publickey goobly gook goes here
-----END PGP PUBLIC KEY BLOCK-----') As pubkey) As keys;

i am getting the error as ERROR: Corrupt ascii-armor.

please help me to encrypt data..

or is there any other way to achieve safe encryption and decryption.?
 
--
Regards :
Venktesh Guttedar.
Pawel Veselov | 17 Nov 09:55 2014
Picon

documentation for lo_* functions

Hi.

Where is the proper documentation for lo_* functions (e.g. lo_open) that are available as SQL functions? I see libpq functions documented in /static/lo-interfaces.html, but not the SQL ones (from pg_catalog). 

Thank you,
  Pawel.

Pawel Veselov | 17 Nov 09:52 2014
Picon

incremental digest (and other) functions?

Hi.

I was looking into a need of doing SHA on contents of a "large object", and it seems that the only digest (and other) functions provided by pgcrypto produce output based on a single string on an input, without being able to accumulate for multiple data blocks. It would be nice to see those, IMHO. Or may be I missed them?

Thank you,
  Pawel.

Gabriel Sánchez Martínez | 17 Nov 05:50 2014
Picon

invalid OID warning after disk failure

I'm running PostgreSQL 9.3.5 on Ubuntu 14.04 on x86_64.  The database 
directory is on linux mdadm RAID10, using 4 4TB disks and a far=2 
layout.  While the RAID tolerates 1 drive failure nicely, I had the 
misfortune of 2 drives failing consecutively, one of which had many 
sectors reallocated and began failing SMART criteria.  That one is out 
now.  As a result of this some files were corrupted.

I was getting the following errors on some tables:

ERROR:  could not read block 0 in file "base/27810/3995569": 
Input/output error

but after dropping those tables the errors are gone.

The situation appears to be stable now, but upon running REINDEX and 
VACUUM on one of the databases, I get the following:

WARNING:  relation "pg_attrdef" TID 1/1: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/2: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup?  My most recent 
backup is from late September, so I would lose some data.  I also backed 
up what I could as soon as the disks started giving errors, but I don't 
know if I can trust that.

Should I drop the entire cluster?

Regarding hardware, I'm going to add hot standby drives to prevent this 
from happening in the future.

Thanks in advance for your advice.

Regards,
Gabriel

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Seamus Abshere | 17 Nov 00:52 2014
Picon

Comparing results of regexp_matches

hi,

I want to check if two similar-looking addresses have the same numbered
street, like 20th versus 21st.

    2033 21st Ave S
    2033 20th Ave S (they're different)

I get an error:

    # select regexp_matches('2033 21st Ave S', '\m(\d+(?:st|th))\M') =
    regexp_matches('2033 20th Ave S', '\m(\d+(?:st|th))\M');
    ERROR:  functions and operators can take at most one set argument

I've tried `()[1] == ()[1]`, etc. but the only thing that works is
making it into 2 subqueries:

    # select (select * from regexp_matches('2033 21st Ave S',
    '\m(\d+(?:st|th))\M')) = (select * from regexp_matches('2033 20th
    Ave S', '\m(\d+(?:st|th))\M'));
     ?column?
    ----------
     f
    (1 row)

Is there a more elegant way to compare the results of
`regexp_matches()`?

Thanks,
Seamus

-- 
Seamus Abshere, SCEA
https://github.com/seamusabshere

--

-- 
Sent via pgsql-general mailing list (pgsql-general <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Gmane