Federico Di Gregorio | 3 Mar 10:49 2010

Re: Deprecated versions

On 28/02/2010 14:18, Charlie Clark wrote:
> Am 28.02.2010, 12:00 Uhr, schrieb <psycopg-request@...>:
> 
>> psycopg2 releases are not really "supported". If there are bugs they get
>> fixed and new features creep in but I don't have the time to backport
>> fixes to old versions so the support is more in the form of "please
>> install last version".
> 
> Okay, it's probably best to put the requirement into the setup.py and
> remove it from the Zope stuff altogether. It doesn't really make sense
> to check compatability at runtime.

So you suggest to remove the version check in DA.py?

federico

--

-- 
Federico Di Gregorio    <mailto:fog@...> <jid:fog@...>
DISCLAIMER. If I receive a message from you, you are agreeing that:
 1. I am by definition, "the intended recipient".
 2. All information in the email is mine to do with as I see fit and
 make such financial profit, political mileage, or good joke as it lends
 itself to. In particular, I may quote it on USENET or the WWW.
 3. I may take the contents as representing the views of your company.
 4. This overrides any disclaimer or statement of confidentiality that
 may be included on your message.

_______________________________________________
(Continue reading)

Michael Bayer | 3 Mar 16:03 2010

congrats on the website !

Just noticed psycopg2 finally got themselves a real site with real docs ! 
 This is terrific, nice job to everyone involved.

- mike
Brian Jones | 3 Mar 17:16 2010
Picon

expected behavior of fetchmany()

Hi all,

First, a hearty congrats on the new website, and the documentation. I've used psycopg2 on and off for perhaps 4 years, and I discovered so much stuff I never knew yesterday when I first discovered the docs. Can't thank you enough. Keep it up.

Second, the docs didn't answer this question :) 
I have a query that returns millions of rows. I *was* wrapping the call in a python generator that uses fetchmany(), but that's causing issues with big result sets because fetchmany() doesn't create a new transaction for each of its requests for the next chunk of data (more detail below) -- here's what my generator looks like right now:

def get_tons_o_data(self, params, limit=1000):
    cur = self.conn.cursor()
    cur.arraysize=limit
    try:  
       cur.callproc('sch.get_metric_asston_o_data', params) # params is a list
       while True:
          rs = cur.fetchmany(limit)
          if not rs:
             break
          for row in rs:
             yield row
     except Exception as out:
          <exception handling code here>

This works really well for smaller result sets and provides a nice interface for the code that uses it (for row in get_tons_o_data...). The problem comes when I have millions of rows in the result set: the issue is that fetchmany() holds a single transaction open until it has fetched the entire result set instead of opening/closing a transaction for each chunk of data it requests.

I imagine this is probably expected behavior, but with so many rows being returned, the transaction has the potential to stay open so long that it keeps some database maintenance operations from running (some operations in pgsql won't run if there's an open transaction, and won't kill the transaction either).

I guess I was looking more for something that would abstract out a sort of 'limit/offset' loop for me. I'm updating the db function to take a limit and offset, and then I'll have to create a loop to a) get data b) process data c) update limit/offset d) rinse and repeat

I just wanted to make sure I didn't miss some magical setting that would cause fetchmany() to request each chunk of data in a separate transaction before I started making alterations to my code.

Thanks all.
brian

--
Brian K. Jones
Python Magazine  http://www.pythonmagazine.com
My Blog          http://www.protocolostomy.com

_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Brian Jones | 3 Mar 17:27 2010
Picon

Re: expected behavior of fetchmany()

Thanks for the reply, Ben - please remember to cc/reply all to the list!

As for server side cursors, it's not clear that it solves the problem of holding a single transaction open. I imagine that it performs all of its work in a single transaction. I admit to being more familiar with pgsql than psycopg2, so enlightenment is hereby solicited.

brian

On Wed, Mar 3, 2010 at 11:18 AM, Ben DeMott <ben.demott-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Or you could use a Server-Side cursor, is this out of the question?

On Wed, Mar 3, 2010 at 11:16 AM, Brian Jones <bkjones-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi all,

First, a hearty congrats on the new website, and the documentation. I've used psycopg2 on and off for perhaps 4 years, and I discovered so much stuff I never knew yesterday when I first discovered the docs. Can't thank you enough. Keep it up.

Second, the docs didn't answer this question :) 
I have a query that returns millions of rows. I *was* wrapping the call in a python generator that uses fetchmany(), but that's causing issues with big result sets because fetchmany() doesn't create a new transaction for each of its requests for the next chunk of data (more detail below) -- here's what my generator looks like right now:

def get_tons_o_data(self, params, limit=1000):
    cur = self.conn.cursor()
    cur.arraysize=limit
    try:  
       cur.callproc('sch.get_metric_asston_o_data', params) # params is a list
       while True:
          rs = cur.fetchmany(limit)
          if not rs:
             break
          for row in rs:
             yield row
     except Exception as out:
          <exception handling code here>

This works really well for smaller result sets and provides a nice interface for the code that uses it (for row in get_tons_o_data...). The problem comes when I have millions of rows in the result set: the issue is that fetchmany() holds a single transaction open until it has fetched the entire result set instead of opening/closing a transaction for each chunk of data it requests.

I imagine this is probably expected behavior, but with so many rows being returned, the transaction has the potential to stay open so long that it keeps some database maintenance operations from running (some operations in pgsql won't run if there's an open transaction, and won't kill the transaction either).

I guess I was looking more for something that would abstract out a sort of 'limit/offset' loop for me. I'm updating the db function to take a limit and offset, and then I'll have to create a loop to a) get data b) process data c) update limit/offset d) rinse and repeat

I just wanted to make sure I didn't miss some magical setting that would cause fetchmany() to request each chunk of data in a separate transaction before I started making alterations to my code.

Thanks all.
brian

--
Brian K. Jones
Python Magazine  http://www.pythonmagazine.com
My Blog          http://www.protocolostomy.com

_______________________________________________
Psycopg mailing list
Psycopg <at> lists.initd.org
http://lists.initd.org/mailman/listinfo/psycopg





--
Brian K. Jones
Python Magazine  http://www.pythonmagazine.com
My Blog          http://www.protocolostomy.com
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Federico Di Gregorio | 3 Mar 17:28 2010

Re: expected behavior of fetchmany()

On 03/03/2010 17:16, Brian Jones wrote:
> First, a hearty congrats on the new website, and the documentation. I've
> used psycopg2 on and off for perhaps 4 years, and I discovered so much stuff
> I never knew yesterday when I first discovered the docs. Can't thank you
> enough. Keep it up.

Thanks.

> Second, the docs didn't answer this question :)
> I have a query that returns millions of rows. I *was* wrapping the call in a
> python generator that uses fetchmany(), but that's causing issues with big
> result sets because fetchmany() doesn't create a new transaction for each of
> its requests for the next chunk of data (more detail below)
[snip]
> I just wanted to make sure I didn't miss some magical setting that would
> cause fetchmany() to request each chunk of data in a separate transaction
> before I started making alterations to my code.

You can't have the results returned in chunked in different
transactions, your query need to be run in the same transaction.

First of all, try to use a named cursor: this won't solve the problem on
the server but will make the client run faster because fetchmany()
always fetch and cache the whole dataset in memory, UNLESS a named
cursor is used. So, in fact, your code is just a long way to write:

curs.fetchall()

Also, if you really don't care about inconsistent results you can
re-issue the query multiple times using increasing values of OFFSET and

data = curs.fetchall()
conn.commit()

before iterating over data to make sure you cache the results in memory
and end the chunk's transaction as soon as possible.

You can also try to put the connection in autocommit but I don't know
what this will changes vs your maintenance problems (can you details?)

federico

--

-- 
Federico Di Gregorio                                       fog@...
                           There's no certainty - only opportunity. -- V

_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Ben DeMott | 3 Mar 18:25 2010
Picon

Re: expected behavior of fetchmany()

If you have any result set that is by definition "Large" and you
1.) cannot know the absolute resources available or the destination server architecture or configuration
2.) cannot know the absolute size or potential size of the database or its memory impact ahead of time

There is no reason to not use a Server-Side cursor.

You should periodically commit the transaction if the overall time running is an issue.
You can do this and still keep re-using the server-side cursor by specifying: "WITH HOLD"
See http://www.postgresql.org/docs/current/static/sql-declare.html

Postgres (the database) is VASTLY more efficient at storing and paging the result set internally in a Cursor than any driver, client, could ever hope to be.
In fact Postgres has the most advanced map/reduce algorithm in place for Joins and table paging of any database I know of, but if you don't let it, it can't use this super intelligence to keep memory consumption low, and efficiency high.

So its much less of a load on the database to let it do its thing.
Also if other things are "happening" against the table, the desired operation is to only have the records encountered that were present at the time of starting your SELECT in most cases.

With a server-side cursor you CAN get more than 1 record returned at a time, and you can even skip over records.

If you need to perform inserts in-between your long-running server-side cursor, you would simply need to open another cursor, with Pyscopg2 you don't need to use a 2nd connection (like with python-mysql)

If you want to see some conditional insert offset / commit code I can show you some.
I wrote some really gross looking code before I discovered that you could use fetchmany() to get more than one result from a server side cursor, and use scroll() to advance the server-side cursor.

If you are worried about blocking (you need to perform other actions while the cursor is open) you can write an asynchronous query... but using a server-side cursor mostly solves this.

If you are performing some crazy complex operation against Postgres that returns a result set that you intend to later interact with, you would be much better off first performing   ->
INSERT INTO (table / TEMP table) SELECT crazy,complex,query FROM hugetable WHERE arbitrarysums, joins, conditions.


For some examples, feel free to check out this wiki page I recently created:  (the last examples hows a server-side cursor)
http://wiki.postgresql.org/wiki/Using_psycopg2_with_PostgreSQL


ss_cursor = conn.cursor('named_cursor', cursor_factory=psycopg2.extras.DictCursor)
for row in ss_cursor:
  # do stuff

On Wed, Mar 3, 2010 at 11:28 AM, Federico Di Gregorio <fog <at> initd.org> wrote:
On 03/03/2010 17:16, Brian Jones wrote:
> First, a hearty congrats on the new website, and the documentation. I've
> used psycopg2 on and off for perhaps 4 years, and I discovered so much stuff
> I never knew yesterday when I first discovered the docs. Can't thank you
> enough. Keep it up.

Thanks.

> Second, the docs didn't answer this question :)
> I have a query that returns millions of rows. I *was* wrapping the call in a
> python generator that uses fetchmany(), but that's causing issues with big
> result sets because fetchmany() doesn't create a new transaction for each of
> its requests for the next chunk of data (more detail below)
[snip]
> I just wanted to make sure I didn't miss some magical setting that would
> cause fetchmany() to request each chunk of data in a separate transaction
> before I started making alterations to my code.

You can't have the results returned in chunked in different
transactions, your query need to be run in the same transaction.

First of all, try to use a named cursor: this won't solve the problem on
the server but will make the client run faster because fetchmany()
always fetch and cache the whole dataset in memory, UNLESS a named
cursor is used. So, in fact, your code is just a long way to write:

curs.fetchall()

Also, if you really don't care about inconsistent results you can
re-issue the query multiple times using increasing values of OFFSET and

data = curs.fetchall()
conn.commit()

before iterating over data to make sure you cache the results in memory
and end the chunk's transaction as soon as possible.

You can also try to put the connection in autocommit but I don't know
what this will changes vs your maintenance problems (can you details?)

federico

--
Federico Di Gregorio                                       fog-NGVKUo/i/6DYtjvyW6yDsg@public.gmane.org
                          There's no certainty - only opportunity. -- V


_______________________________________________
Psycopg mailing list
Psycopg-IAPFreCvJWPBWskQ1e/+sw@public.gmane.org
http://lists.initd.org/mailman/listinfo/psycopg


_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Robert Coup | 3 Mar 21:10 2010
Picon

Re: expected behavior of fetchmany()

I think what Ben is trying to say is that server-side cursors in
Postgres do not build the entire resultset in memory, on either the
server or the client side. As you fetch more from the cursor, it

As Frederico said, fetchmany() does the same as fetchall() for a
client-side cursor - the entire resultset is built in memory on the
server, and then transferred to the client, regardless of how many or
which records you access.

So you want a server-side (named) cursor :) See
http://initd.org/psycopg/docs/usage.html#server-side-cursors

HTH,

Rob :)
Charlie Clark | 4 Mar 15:51 2010

Re: Deprecated versions

Am 03.03.2010, 12:00 Uhr, schrieb <psycopg-request <at> lists.initd.org>:

> So you suggest to remove the version check in DA.py?

Yes. At most it should only happen when ZPsycopgDA is "initialised" by  
Zope with Zope 2.12. But even this can cause problems if psycopg2 has been  
upgraded (I had this last night on an old site) supporting eggs it makes  
more sense to put the requirements into setup.py

Charlie
--

-- 
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226
_______________________________________________
Psycopg mailing list
Psycopg <at> lists.initd.org
http://lists.initd.org/mailman/listinfo/psycopg
Devrim GÜNDÜZ | 11 Mar 21:51 2010

Small patch to build psycopg2 against 9.0 Alpha4

Hi,

Attached is a very small patch that can detect "alpha" tag in PostgreSQL
9.0alpha4. Since PostgreSQL will continue releasing alpha sets in the
future, this patch should appear in all releases.

Regards,
--

-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz
Attachment (psycopg2-pg90.patch): text/x-patch, 666 bytes
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Federico Di Gregorio | 11 Mar 22:42 2010

Re: Small patch to build psycopg2 against 9.0 Alpha4

On 11/03/2010 21:51, Devrim GÜNDÜZ wrote:
> Attached is a very small patch that can detect "alpha" tag in PostgreSQL
> 9.0alpha4. Since PostgreSQL will continue releasing alpha sets in the
> future, this patch should appear in all releases.

Applied, thanks.

federico

--

-- 
Federico Di Gregorio                                       fog@...
 heisenbug /hi:'zen-buhg/ /n./ [from Heisenberg's Uncertainty Principle
  in quantum physics] A bug that disappears or alters its behavior when
  one attempts to probe or isolate it.

_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg

Gmane