David Clarke | 1 Jun 2006 13:05
Picon

Table design question

I'm reading Joe Celko's book SQL Programming Style for the second time
and although I've been an OO developer for quite a few years I'm
fairly green wrt SQL. Joe is obviously something of a curmudgeon and I
would fall squarely into his newbie OO developer ordinal scale and I'm
trying to avoid the slide into stupid newbie OO developer.

So I'm designing a table and I'm looking for an appropriate key. The
natural key is a string from a few characters up to a maximum of
perhaps 100. Joe gets quite fierce about avoiding the use of a serial
id column as a key. The string is unique in the table and fits the
criteria for a key. So should I follow Joe's advice and use my natural
key as the primary key? It sounds reasonable but it will mean at least
one other table will have the string as a foreign key. My postgres
intro book has id columns all over the place but is it really that big
an issue these days to have a 100 character primary key? Are there
postgres-specific implications for either approach?

Thanks
Dave

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Miroslav Šulc | 1 Jun 2006 15:26
Picon

SELECT DISTINCT too slow

Hello,

I have a table with cca 400,000 rows. The table contains column "key" of varchar(20) type containing 10 distinct values. I want to get out what distinct values are present in the column. I use this simple query, which is very slow:

SELECT DISTINCT Key FROM MRTPContactValue

Here is the query plan:

QUERY PLAN
Unique (cost=64882.26..66964.59 rows=9 width=9) (actual time=26139.972..29593.164 rows=10 loops=1)
-> Sort (cost=64882.26..65923.43 rows=416466 width=9) (actual time=26139.964..27975.944 rows=416466 loops=1)
Sort Key: "key"
-> Seq Scan on mrtpcontactvalue (cost=0.00..8669.66 rows=416466 width=9) (actual time=0.026..2460.535 rows=416466 loops=1)
Total runtime: 29603.159 ms

I've tried index on the "key" column but no improvement.

Is there a way to speed the SELECT up?

Thank you for any suggestions.
-- Miroslav Šulc
Attachment (miroslav.sulc.vcf): text/x-vcard, 349 bytes

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo <at> postgresql.org so that your
       message can get through to the mailing list cleanly
Rod Taylor | 1 Jun 2006 15:41
Picon

Re: Table design question

> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the
> criteria for a key. So should I follow Joe's advice and use my natural
> key as the primary key? It sounds reasonable but it will mean at least
> one other table will have the string as a foreign key. My postgres

Unfortunately as a result of the implementation of most commercial
databases (and PostgreSQL), using a string as the primary key will cause
a performance hit.

Yes, normally it is better and for things like status flags and other
constant values that rarely change it is a good approach.

I would tend to add a SERIAL and make it the primary key (using it in
foreign tables) but also making the string column unique and not
null'able.

Essentially it gives the table two primary keys in the hope that some
day PostgreSQL will add a layer of abstraction between primary/foreign
key lookups and presentation that allows for usage of strings directly
without a performance hit.
--

-- 

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Alvaro Herrera | 1 Jun 2006 15:53
Favicon
Gravatar

Re: SELECT DISTINCT too slow

Miroslav ?ulc wrote:
> Hello,
> 
> I have a table with cca 400,000 rows. The table contains column "key" of
> varchar(20) type containing 10 distinct values. I want to get out what
> distinct values are present in the column. I use this simple query,
> which is very slow:
> 
> SELECT DISTINCT Key FROM MRTPContactValue

You could get the universe of values from the table where this is a
primary key, and use an IN clause (which apparently is more efficient
than an EXISTS in some cases, but try that too) to search for values
that exist in the MRTPContactValue table.

I assume you do have the other table, don't you?

--

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Miroslav Šulc | 1 Jun 2006 16:49
Picon

Re: SELECT DISTINCT too slow

Well, "key" is not primary key from another table. It is just a column in pair "key" => "value".
The structure of the table is this:

Id (primary key)
MRTPContactId (id of contact from table MRTPContact)
Key (key from pair key => value)
Value (value from pair key => value)

So I want the get the list of keys used in the table.

Miroslav Šulc

Alvaro Herrera napsal(a):
Miroslav ?ulc wrote:
Hello, I have a table with cca 400,000 rows. The table contains column "key" of varchar(20) type containing 10 distinct values. I want to get out what distinct values are present in the column. I use this simple query, which is very slow: SELECT DISTINCT Key FROM MRTPContactValue
You could get the universe of values from the table where this is a primary key, and use an IN clause (which apparently is more efficient than an EXISTS in some cases, but try that too) to search for values that exist in the MRTPContactValue table. I assume you do have the other table, don't you?
Attachment (miroslav.sulc.vcf): text/x-vcard, 349 bytes

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Alvaro Herrera | 1 Jun 2006 16:53
Favicon
Gravatar

Re: SELECT DISTINCT too slow

Miroslav ?ulc wrote:
> Well, "key" is not primary key from another table. It is just a column
> in pair "key" => "value".
> The structure of the table is this:
> 
> Id (primary key)
> MRTPContactId (id of contact from table MRTPContact)
> Key (key from pair key => value)
> Value (value from pair key => value)
> 
> So I want the get the list of keys used in the table.

The plan you get is the most efficient possible for that query.  If you
had a table of possible keys (which should of course be FK of "Key"),
you could get a much faster version :-)

--

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Miroslav Šulc | 1 Jun 2006 17:03
Picon

Re: SELECT DISTINCT too slow

It might be a good solution :-)

Thank you for your help.

Miroslav Šulc

Alvaro Herrera napsal(a):
> Miroslav ?ulc wrote:
>   
>> Well, "key" is not primary key from another table. It is just a column
>> in pair "key" => "value".
>> The structure of the table is this:
>>
>> Id (primary key)
>> MRTPContactId (id of contact from table MRTPContact)
>> Key (key from pair key => value)
>> Value (value from pair key => value)
>>
>> So I want the get the list of keys used in the table.
>>     
>
> The plan you get is the most efficient possible for that query.  If you
> had a table of possible keys (which should of course be FK of "Key"),
> you could get a much faster version :-)
>
>   
Attachment (miroslav.sulc.vcf): text/x-vcard, 349 bytes

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match
Tom Lane | 1 Jun 2006 17:20
Picon

Re: SELECT DISTINCT too slow

=?ISO-8859-2?Q?Miroslav_=A9ulc?= <miroslav.sulc <at> startnet.cz> writes:
> I have a table with cca 400,000 rows. The table contains column "key" of
> varchar(20) type containing 10 distinct values. I want to get out what
> distinct values are present in the column. I use this simple query,
> which is very slow:

> SELECT DISTINCT Key FROM MRTPContactValue

Try
	SELECT Key FROM MRTPContactValue GROUP BY Key

The "select distinct" code is a bit old and crufty, GROUP BY is usually
smarter.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo <at> postgresql.org so that your
       message can get through to the mailing list cleanly

Miroslav Šulc | 1 Jun 2006 17:39
Picon

Re: SELECT DISTINCT too slow

The GROUP BY is really fast :-)

Thank you.

Miroslav Šulc

Tom Lane napsal(a):
> Try
> 	SELECT Key FROM MRTPContactValue GROUP BY Key
>
> The "select distinct" code is a bit old and crufty, GROUP BY is usually
> smarter.
>
> 			regards, tom lane
>   
Attachment (miroslav.sulc.vcf): text/x-vcard, 349 bytes

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match
Greg Stark | 1 Jun 2006 17:40
Picon
Favicon

Re: Table design question


"David Clarke" <pigwin32 <at> gmail.com> writes:

> is it really that big an issue these days to have a 100 character primary
> key? Are there postgres-specific implications for either approach?

It's exactly the same size issue as ever. A 20% increase in space usage is a
20% performance hit in certain types of queries regardless of how fast or
cheap your hardware has become.

This is an issue where reasonable people differ quite vociferously so you may
get contradictory responses. But really it depends heavily on the exact
circumstances and requires a judgement call based on experience. Any time
someone tries to sum it up with a blanket rule it's going to be wrong some of
the time.

But that said I also tend to tilt towards creating serial ids. Unless the
string is already a synthetic unique identifier created by your application
you can't really trust its "uniqueness" for all time. Plenty of people have
built databases using natural unique keys that turned out to be not so unique
or turned out to just be a poor choice for external reasons (think of Social
Security Numbers, for example).

I've had my string unique identifiers corrupted by uppercasing, declared case
insensitive, declared private information that couldn't be leaked, and
declared offensive words that had to be updated. Each of which is a pain to
deal with when it's your primary key.

--

-- 
greg

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Gmane