Emmanuel Tabard | 26 Jan 18:42
Picon
Gravatar

Fwd: [sql] memory consumption


It's so slow and takes so much memory because it was thought to work with
a few hundreds of entries. :-D

Fair enough :D

Wow, that's an interesting problem... I guess it can be heavily improved,
especially if we can store some information to the disc.
Anyway, it's not an easy task: the real problem is that we don't have a
unique ID to identify a movie (that would be the ID that we're saving... but
the problem is matching it to the other information of the row: title, year,
imdb_index, kind, etc. etc.)

The thing is, the whole database takes 5go. That's why I was wondering how the script can eat 20go of memory. Maybe sqlobject leaks !
You could do it in 4 steps :
- Grab all informations from the existing database (imdb id, title, index, year, kind) and store it in a temporary table or text file.
- Drop the database
- rebuild it
- iterate in your file/temp table and restore the ids one by one

But it could be slow to query the fresh database with your temp table datas. (Because of the text fields ...)
Anyway, it takes 10 hours to store the ids in memory. Can't be worse :D

To make it faster you can also generate a unique signature for each rows (sha1(title, index, year, kinds)?). Index this field and your temp table would be : imdbid | signature.
It should be quick.

With mysql you can also warmup indexes this way :

SHOW TABLES in imdbpy
-> for each table LOAD INDEX INTO CACHE table


- Emmanuel

Le 26 janv. 2012 à 17:23, Davide Alberani a écrit :

On Thu, Jan 26, 2012 at 15:32, Emmanuel Tabard <manu-+vDP33JFECNGWvitb5QawA@public.gmane.org> wrote:

First of all, thank you for imdbpy. This is really plug'n play, well done !!!

Thanks. :-P

Context :
- Import all imdb database (from text dumps) - first time it's fast and ok
- I have the imdb ids for 90% of titles and names (no need for companies and characters)

That's a lot of data. :)

My problem comes when imdbpy updates my database. It takes hours to save the imdb ids and it consumes a *lot* of memory. Almost all of my RAM (24go) ...

Is there a way to optimize that step ? Why does it takes so much memory ?

It's so slow and takes so much memory because it was thought to work with
a few hundreds of entries. :-D

Wow, that's an interesting problem... I guess it can be heavily improved,
especially if we can store some information to the disc.
Anyway, it's not an easy task: the real problem is that we don't have a
unique ID to identify a movie (that would be the ID that we're saving... but
the problem is matching it to the other information of the row: title, year,
imdb_index, kind, etc. etc.)

Hmmm... I promise to think about it in the weekend.  If anyone have a
nice solution to this problem, any hint is welcome!

--
Davide Alberani <davide.alberani-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@...
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel
Garland, Ken R | 26 Jan 16:34
Picon
Gravatar

Re: [sql] memory consumption

Could you provide the methods you use for update and initial db creation, or pastebin the code.

On Thu, Jan 26, 2012 at 9:32 AM, Emmanuel Tabard <manu-+vDP33JFECNGWvitb5QawA@public.gmane.org> wrote:
hi,

First of all, thank you for imdbpy. This is really plug'n play, well done !!!

Context :
- Import all imdb database (from text dumps) - first time it's fast and ok
- I have the imdb ids for 90% of titles and names (no need for companies and characters)
- I'll pay imdb for using those datas (I need to control the datas instead of using their own api)

My problem comes when imdbpy updates my database. It takes hours to save the imdb ids and it consumes a *lot* of memory. Almost all of my RAM (24go) ...

Is there a way to optimize that step ? Why does it takes so much memory ?

I can help you with this if you don't have much time, I'm python dev but I need input :)


Cheers,

- Emmanuel

PS : Sorry about my first non-member thread, wrong email ...
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel-5NWGOfrQmnc@public.gmane.orgurceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel



--
- Ken
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@...
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel
Emmanuel Tabard | 26 Jan 15:32
Picon
Gravatar

[sql] memory consumption

hi,

First of all, thank you for imdbpy. This is really plug'n play, well done !!!

Context : 
- Import all imdb database (from text dumps) - first time it's fast and ok
- I have the imdb ids for 90% of titles and names (no need for companies and characters)
- I'll pay imdb for using those datas (I need to control the datas instead of using their own api)

My problem comes when imdbpy updates my database. It takes hours to save the imdb ids and it consumes a *lot*
of memory. Almost all of my RAM (24go) ...

Is there a way to optimize that step ? Why does it takes so much memory ?

I can help you with this if you don't have much time, I'm python dev but I need input :)

Cheers,

- Emmanuel

PS : Sorry about my first non-member thread, wrong email ...
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
Davide Alberani | 1 Nov 14:54
Picon
Gravatar

IMDbPY 4.8 and new site released

Hi all,
I've just released the long-awaited IMDbPY 4.8, with too many bug fixes
to mention.  Pardon for the slowdown in the development; I'm sure there
are still many bugs and I'd like to see some fixes to some core pieces of
code (after more than 7 years and almost 50 releases, it's probably a good
idea to rewrite a function or two ;-)

So, if anyone wants to help, let us know!

With this release, we also have a shiny new web site, courtesy of
Alberto Malagoli who kindly joined the development team (thanks
and welcome aboard, Alberto!)

As usual, you can download IMDbPY from: http://imdbpy.sf.net/

Enjoy!

--

-- 
Davide Alberani <davide.alberani@...>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
RSA&reg; Conference 2012
Save &#36;700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
satyam mukherjee | 26 Aug 00:12
Picon

A local database of Awardlist

Hi 

Is it possible to get the awards list for all movies into a local database?
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@...
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel
darklow | 23 Aug 13:49
Picon
Gravatar

Are there somewhere production status like "In Production", "Planning", "In Development" in the imdbpy database?

Are there somewhere Production Status in the imdbpy database, like "In
Production", "Planning", "In Development" and so on for the upcoming
movies?
Maybe i missed some .csv datafiles, but i can't find any in my DB.
Thank you.

------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
Saravanan | 31 Jul 09:58
Picon

Getting information about genres only

Hello,

I am trying to write some Ubuntu Unity lens using IMDBPy. For this 
purpose, I need only genre information. I took a look at the code and 
did not find any get_movie_genre kinda function. So currently, I am 
using ia.update(x,"main") . Is there anyway to just the genre information?

Thanks for developing this wonderful program !

Regards,
Saravanan

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
LinuxUser | 26 May 23:05
Picon

only get certain data with update?

Is there a way to only get some data with ia.update to speed up
things? I just need votes and rating.

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
LinuxUser | 9 May 12:00
Picon

AttributeError: 'module' object has no attribute 'IMDb'

http://pastebin.com/XLmiYbFW
---------------------------
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/supybot/callbacks.py", line 1180,
in _callCommand
    self.callCommand(command, irc, msg, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/supybot/utils/python.py", line
86, in g
    f(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/supybot/callbacks.py", line 1166,
in callCommand
    method(irc, msg, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/supybot/commands.py", line 913,
in newf
    f(self, irc, msg, args, *state.args, **state.kwargs)
  File "/home/bash/Download/Supybot-0.83.4.1/plugins/Supybot-imdb/
plugin.py", line 81, in imdb
    ia = imdb.IMDb()
AttributeError: 'module' object has no attribute 'IMDb'

----------------

I googled it and a bunch of people said to raname the py file but that
obviously isn't the prob. IDK where else to turn so I'm posting it
here.

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
Thomas Stewart | 10 Apr 23:25
Picon
Gravatar

imdbpy2sql failing

Hi,

I'm having some trouble with importing the text files with imdbpy2sql.
I'm running Debian with python 2.6.6-8+b1, postgresql 9.0.3-1 and
imdbpy 4.7.0-1.

I created a database called imdb in the usual way. Debian puts
imdbpy2sql in /usr/share/doc/python-imdbpy/examples/imdbpy2sql.py.gz.
I usually extract it and put it in /tmp/imdbpy2sql. I ran this:
$ /tmp/imdbpy2sql.py -d ~/imdb/lists -u postgres:///var/run/postgresql/imdb

It starts processing as normal. However at some point in the middle of
the actors, psycopg2 thows a DataError.

 * FLUSHING SQLData...
SCANNING actor: Hartley, Jalaal
SCANNING actor: Harwood, Anthony (II)
 * FLUSHING PersonsCache...
 * FLUSHING SQLData...
SCANNING actor: Hatcher, Steve
SCANNING actor: Havers, Nigel
 * FLUSHING SQLData...
SCANNING actor: Hayden, Luke
 * FLUSHING CharactersCache...
Traceback (most recent call last):
  File "/tmp/imdbpy2sql.py", line 2950, in <module>
    run()
  File "/tmp/imdbpy2sql.py", line 2811, in run
    castLists(_charIDsList=characters_imdbIDs)
  File "/tmp/imdbpy2sql.py", line 1575, in castLists
    doCast(f, roleid, rolename)
  File "/tmp/imdbpy2sql.py", line 1534, in doCast
    cid = CACHE_CID.addUnique(role)
  File "/tmp/imdbpy2sql.py", line 957, in addUnique
    else: return self.add(key, miscData)
  File "/tmp/imdbpy2sql.py", line 950, in add
    self[key] = c
  File "/tmp/imdbpy2sql.py", line 860, in __setitem__
    self.flush()
  File "/tmp/imdbpy2sql.py", line 883, in flush
    self._toDB(quiet)
  File "/tmp/imdbpy2sql.py", line 1185, in _toDB
    CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320

When I run /usr/share/doc/python-imdbpy/goodies/reduce.sh to get the
data size down a little the whole import works fine. So I'm guessing
there are some stray characters in the text somewhere that are not
being decoded properly to unicode, but I have no idea where to try to
fix it.

Regards
--
Tom

------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
Davide Alberani | 23 Jan 16:12
Picon
Gravatar

IMDbPY 4.7 released

Released after a long delay, and despite that in a hurry, IMDbPY 4.7 can
be found here:
  http://imdbpy.sf.net/

This is a transitional release, after the recent redesign of the IMDb pages.
A new account is used and the new pages are parsed; for sure there are
still many bugs; please read the README.redesign file for other details.

Please contribute to the development with fixes and bug reports.

Enjoy!
--

-- 
Davide Alberani <davide.alberani@...>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d

Gmane