Josh Berkus | 1 Nov 01:01 2004

Re: Speeding up Gist Index creations

Mallah,

> Gist indexes take a long time to create as compared
> to normal indexes is there any way to speed them up ?
>
> (for example by modifying sort_mem or something temporarily )

More sort_mem will indeed help.  So will more RAM and a faster CPU.

Our GIST-index-creation process is probably not optimized; this has been 
marked as "needs work" in the code for several versions.   If you know anyone 
who can help Oleg & Teodor out, be put them in touch ...

--

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Gaetano Mendola | 1 Nov 02:11 2004
Picon

Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

Thomas F.O'Connell wrote:
> 
> As a result, I was intending to inflate the value of 
> effective_cache_size to closer to the amount of unused RAM on some of 
> the machines I admin (once I've verified that they all have a unified 
> buffer cache). Is that correct?
> 

Effective cache size is IMHO a "bogus" parameter on postgresql.conf,
this because:

1) That parameter is not intended to instruct postgres to use that ram but
    is only an hint to the engine on what the "DBA" *believe* the OS cache
    memory for postgres
2) This parameter change only the cost evaluation of plans ( and not soo
    much )

so don't hope to double this parameter and push postgres to use more RAM.

Regards
Gaetano Mendola

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

D'Arcy J.M. Cain | 1 Nov 13:59 2004
Picon

Re: Thanks Chariot Solutions

On Sun, 31 Oct 2004 13:38:55 -0500 (EST)
brew <at> theMode.com wrote:
> 
> Many thanks to Chariot Solutions, http://chariotsolutions.com, for
> hosting Bruce Momjian giving one of his PostgreSQL seminars outside of
> Philadelphia, PA yesterday. There were about sixty folks there, one
> person driving from Toronto and another coming from California (!).

Seconded.  It was definitely worth the drive from Toronto.

--

-- 
D'Arcy J.M. Cain <darcy <at> druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Gavin Sherry | 1 Nov 14:45 2004
Picon

Re: psql large RSS (1.6GB)

On Sat, 30 Oct 2004, Dustin Sallings wrote:

> >   If the solution is to just write a little client that uses perl
> > DBI to fetch rows one at a time and write them out, that's doable,
> > but it would be nice if psql could be made to "just work" without
> > the monster RSS.
>
> 	It wouldn't make a difference unless that driver implements the
> underlying protocol on its own.

Even though we can tell people to make use of cursors, it seems that
memory usage for large result sets should be addressed. A quick search of
the archives does not reveal any discussion about having libpq spill to
disk if a result set reaches some threshold. Has this been canvassed in
the past?

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Bruce Momjian | 1 Nov 15:04 2004
Picon

Re: psql large RSS (1.6GB)

Gavin Sherry wrote:
> On Sat, 30 Oct 2004, Dustin Sallings wrote:
> 
> > >   If the solution is to just write a little client that uses perl
> > > DBI to fetch rows one at a time and write them out, that's doable,
> > > but it would be nice if psql could be made to "just work" without
> > > the monster RSS.
> >
> > 	It wouldn't make a difference unless that driver implements the
> > underlying protocol on its own.
> 
> Even though we can tell people to make use of cursors, it seems that
> memory usage for large result sets should be addressed. A quick search of
> the archives does not reveal any discussion about having libpq spill to
> disk if a result set reaches some threshold. Has this been canvassed in
> the past?

No, I don't remember hearing this discussed and I don't think most
people would want libpq spilling to disk by default.

--

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman <at> candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

(Continue reading)

Tom Lane | 1 Nov 17:03 2004
Picon

Re: psql large RSS (1.6GB)

Bruce Momjian <pgman <at> candle.pha.pa.us> writes:
> No, I don't remember hearing this discussed and I don't think most
> people would want libpq spilling to disk by default.

Far more useful would be some sort of streaming API to let the
application process the rows as they arrive, or at least fetch the rows
in small batches (the V3 protocol supports the latter even without any
explicit use of a cursor).  I'm not sure if this can be bolted onto the
existing libpq framework reasonably, but that's the direction I'd prefer
to go in.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo <at> postgresql.org)

Anjan Dave | 1 Nov 18:37 2004

shared_buffers and Shared Memory Segments

Hello,

 

I am trying to understand the output of the ‘ipcs’ command during peak activity and how I can use it to possibly tune the shared_buffers…

 

Here’s what I see right now: (ipcs –m) – (Host is RHAS 3.0)

 

------ Shared Memory Segments --------

key        shmid      owner      perms      bytes      nattch     status

0x0052e2c1 1966080    postgres  600        92078080   322

 

What is nattch? Is this the num of segments attached? Is it saying that about 92MB is used out of 512MB?

 

-Shared memory segment size is defined to be 512MB

 

 

-Currently, shared_buffers are at 80MB (10240)

 

 

Here’s the ‘top’ output:

 

12:29:42  up 24 days, 15:04,  6 users,  load average: 2.28, 1.07, 1.07

421 processes: 414 sleeping, 3 running, 4 zombie, 0 stopped

CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle

           total   83.6%    0.0%   40.8%   0.0%     7.6%   76.4%  190.0%

           cpu00   20.9%    0.0%    9.0%   0.3%     0.1%   22.5%   46.8%

           cpu01   19.2%    0.0%   10.6%   0.0%     7.3%   14.4%   48.3%

           cpu02   15.0%    0.0%    7.3%   0.0%     0.0%    8.6%   68.9%

           cpu03   28.6%    0.0%   14.0%   0.0%     0.1%   31.0%   26.0%

Mem:  7973712k av, 7675856k used,  297856k free,       0k shrd,  149220k buff

                   3865444k actv, 2638404k in_d,  160092k in_c

Swap: 4096532k av,      28k used, 4096504k free                 6387092k cached

 

 

Can I conclude anything from these outputs and the buffer setting?

 

 

Appreciate any thoughts.

 

 

Thanks,
Anjan

Josh Berkus | 1 Nov 19:49 2004

Re: psql large RSS (1.6GB)

Tom,

> Far more useful would be some sort of streaming API to let the
> application process the rows as they arrive, or at least fetch the rows
> in small batches (the V3 protocol supports the latter even without any
> explicit use of a cursor).  I'm not sure if this can be bolted onto the
> existing libpq framework reasonably, but that's the direction I'd prefer
> to go in.

I think that TelegraphCQ incorporates this.   However, I'm not sure whether 
it's a portable component; it may be too tied in to their streaming query 
engine.   They have talked about porting their "background query" patch for 
PSQL, though ...

--

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Alvaro Nunes Melo | 1 Nov 22:40 2004
Picon

Performance difference when using views

Hi,

I have some views that are used to make some queries simplest. But when
I use them there is a performance loss, because the query don't use
indexes anymore. Below I'm sending the query with and without the view,
its execution times, explains and the view's body. I didn't understood
the why the performance is so different (20x in seconds, 1000x in page
reads) if the queries are semantically identical.

Shouldn't I use views in situations like this? Is there some way to use
the view and the indexes?

--------------
-- View body
--------------

CREATE VIEW vw_test AS
SELECT e.person_id, ci.city_id, ci.city_name, s.state_id,
s.state_acronym
  FROM address a
       LEFT OUTER JOIN zip zp ON a.zip_code_id = zp.zip_code_id
       LEFT OUTER JOIN city ci ON ci.city_id = zp.city_id
       LEFT OUTER JOIN state s ON ci.state_id = s.state_id
 WHERE a.adress_type = 2;

---------------------
-- Without the view
---------------------

SELECT p.person_id, ci.city_id, ci.city_name, s.state_id,
s.state_acronym
  FROM person p
     LEFT OUTER JOIN address e USING (person_id)
       LEFT OUTER JOIN zip zp ON a.zip_code_id = zp.zip_code_id
       LEFT OUTER JOIN city ci ON ci.city_id = zp.city_id
       LEFT OUTER JOIN state u ON ci.state_id = s.state_id
 WHERE a.adress_type = 2
 AND p.person_id = 19257;

 person_id | city_id   | city_name | state_id | state_acronym
-----------+-----------+-----------+----------+---------------
     19257 |     70211 | JAGUARAO  |       22 | RS
(1 record)
Time: 110,047 ms

QUERY PLAN
---------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..20.04 rows=1 width=33)
   Join Filter: ("outer".state_id = "inner".state_id)
   ->  Nested Loop Left Join  (cost=0.00..18.43 rows=1 width=27)
         ->  Nested Loop Left Join  (cost=0.00..13.87 rows=1 width=8)
               ->  Nested Loop  (cost=0.00..10.75 rows=1 width=8)
                     ->  Index Scan using pk_person on person p
(cost=0.00..5.41 rows=1 width=4)
                           Index Cond: (person_id = 19257)
                     ->  Index Scan using un_address_adress_type on
address e  (cost=0.00..5.33 rows=1 width=8)
                           Index Cond: (19257 = person_id)
                           Filter: (adress_type = 2)
               ->  Index Scan using pk_zip on zip zp  (cost=0.00..3.11
rows=1 width=8)
                     Index Cond: ("outer".zip_code_id = zp.zip_code_id)
         ->  Index Scan using pk_city on city ci  (cost=0.00..4.55
rows=1 width=23)
               Index Cond: (ci.city_id = "outer".city_id)
   ->  Seq Scan on state u  (cost=0.00..1.27 rows=27 width=10)
(15 records)

---------------------
-- With the view
---------------------

SELECT p.person_id, t.city_id, t.city_name, t.state_id, t.state_acronym
  FROM person p
     LEFT OUTER JOIN vw_test t USING (person_id)
 WHERE p.person_id = 19257;

 person_id | city_id   | city_name | state_id | state_acronym
-----------+-----------+-----------+----------+--------------
     19257 |     70211 | JAGUARAO  |       22 | RS
(1 record)
Time: 1982,743 ms

QUERY PLAN
---------------------------------------------------------------------
 Nested Loop Left Join  (cost=10921.71..28015.63 rows=1 width=33)
   Join Filter: ("outer".person_id = "inner".person_id)
   ->  Index Scan using pk_person on person p  (cost=0.00..5.41 rows=1
width=4)
         Index Cond: (person_id = 19257)
   ->  Hash Left Join  (cost=10921.71..27799.55 rows=16854 width=33)
         Hash Cond: ("outer".state_id = "inner".state_id)
         ->  Hash Left Join  (cost=10920.38..27545.40 rows=16854
width=27)
               Hash Cond: ("outer".city_id = "inner".city_id)
               ->  Hash Left Join  (cost=10674.20..26688.88 rows=16854
width=8)
                     Hash Cond: ("outer".zip_code_id =
"inner".zip_code_id)
                     ->  Seq Scan on address e  (cost=0.00..1268.67
rows=16854 width=8)
                           Filter: (adress_type = 2)
                     ->  Hash  (cost=8188.36..8188.36 rows=387936
width=8)
                           ->  Seq Scan on zip zp  (cost=0.00..8188.36
rows=387936 width=8)
               ->  Hash  (cost=164.94..164.94 rows=9694 width=23)
                     ->  Seq Scan on city ci  (cost=0.00..164.94
rows=9694 width=23)
         ->  Hash  (cost=1.27..1.27 rows=27 width=10)
               ->  Seq Scan on state u  (cost=0.00..1.27 rows=27
width=10)
(18 records)

Best regards,

--

-- 
+---------------------------------------------------+
|  Alvaro Nunes Melo    Atua Sistemas de Informacao |
| al_nunes <at> atua.com.br        www.atua.com.br       |
|    UIN - 42722678            (54) 327-1044        |
+---------------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo <at> postgresql.org)

Simon Riggs | 1 Nov 23:03 2004

Re: [PERFORM] [PATCHES] ARC Memory Usage analysis

On Wed, 2004-10-27 at 01:39, Josh Berkus wrote:
> Thomas,
> 
> > As a result, I was intending to inflate the value of
> > effective_cache_size to closer to the amount of unused RAM on some of
> > the machines I admin (once I've verified that they all have a unified
> > buffer cache). Is that correct?
> 
> Currently, yes.  

I now believe the answer to that is "no, that is not fully correct",
following investigation into how to set that parameter correctly.

> Right now, e_c_s is used just to inform the planner and make 
> index vs. table scan and join order decisions.

Yes, I agree that is what e_c_s is used for.

...lets go deeper:

effective_cache_size is used to calculate the number of I/Os required to
index scan a table, which varies according to the size of the available
cache (whether this be OS cache or shared_buffers). The reason to do
this is because whether a table is in cache can make a very great
difference to access times; *small* tables tend to be the ones that vary
most significantly. PostgreSQL currently uses the Mackert and Lohman
[1989] equation to assess how much of a table is in cache in a blocked
DBMS with a finite cache. 

The Mackert and Lohman equation is accurate, as long as the parameter b
is reasonably accurately set. [I'm discussing only the current behaviour
here, not what it can or should or could be] If it is incorrectly set,
then the equation will give the wrong answer for small tables. The same
answer (i.e. same asymptotic behaviour) is returned for very large
tables, but they are the ones we didn't worry about anyway. Getting the
equation wrong means you will choose sub-optimal plans, potentially
reducing your performance considerably.

As I read it, effective_cache_size is equivalent to the parameter b,
defined as (p.3) "minimum buffer size dedicated to a given scan". M&L
they point out (p.3) "We...do not consider interactions of
multiple users sharing the buffer for multiple file accesses". 

Either way, M&L aren't talking about "the total size of the cache",
which we would interpret to mean shared_buffers + OS cache, in our
effort to not forget the beneficial effect of the OS cache. They use the
phrase "dedicated to a given scan"....

AFAICS "effective_cache_size" should be set to a value that reflects how
many other users of the cache there might be. If you know for certain
you're the only user, set it according to the existing advice. If you
know you aren't, then set it an appropriate factor lower. Setting that
accurately on a system wide basis may clearly be difficult and setting
it high will often be inappropriate.

The manual is not clear as to how to set effective_cache_size. Other
advice misses out the effect of the many scans/many tables issue and
will give the wrong answer for many calculations, and thus produce
incorrect plans for 8.0 (and earlier releases also).

This is something that needs to be documented rather than a bug fix.
It's a complex one, so I'll await all of your objections before I write
a new doc patch.

[Anyway, I do hope I've missed something somewhere in all that, though
I've read their paper twice now. Fairly accessible, but requires
interpretation to the PostgreSQL case. Mackert and Lohman [1989] "Index
Scans using a finite LRU buffer: A validated I/O model"]

> The problem which Simon is bringing up is part of a discussion about doing 
> *more* with the information supplied by e_c_s.    He points out that it's not 
> really related to the *real* probability of any particular table being 
> cached.   At least, if I'm reading him right.

Yes, that was how Jan originally meant to discuss it, but not what I meant.

Best regards,

Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


Gmane