Dave Kuhlman | 1 Apr 2010 22:41

Re: Temporary data attached to custom subclasses

> Date: Wed, 31 Mar 2010 10:34:34 +0200
> From: Stefan Behnel <stefan_ml <at> behnel.de>
> To: lxml-dev <at> codespeak.net
> Subject: Re: [lxml-dev] Tempory data attached to custom subclasses
> 
> Dave Kuhlman, 29.03.2010 23:48:
> > I've been using the custom subclasses capability of lxml.  It's
> > slick.
> >
> > I do, however, miss the ability to attach temporary data to the
> > ElementBase subclasses.  (see the warnings under "Element
> > initialization" at http://codespeak.net/lxml/element_classes.html)
> >
> > I can, as suggested by the docs, add attributes or children to the
> > underlying etree.Element, but that means that I'd have to strip
> > that temporary data off when I want to serialize the tree.
> 
> As long as your tree doesn't change, the easiest solution is to keep a 
> reference to all Elements ("list(root.iter())") and then just store the 
> data in the proxy instances. They are guaranteed not to change as long as 
> there is a live reference to them.
> 
> If your tree changes, you can still try to add new Elements to your 
> keep-alive list to get the same behaviour, but you may need to take a 
> little more care when you remove elements, so that you only remove them 
> from the keep-alive list when you are sure they'll get discarded.
> 

Stefan -

(Continue reading)

Stefan Behnel | 1 Apr 2010 23:14
Picon
Favicon

Re: Temporary data attached to custom subclasses

Dave Kuhlman, 01.04.2010 22:41:
> You might want to add your two paragraphs (above) or something like
> the following:
>
>      "If you really must store temporary data on an element that you
>      do not want serialized, then you should put any nodes which
>      must be persistent on a keep-alive list (or other container),
>      since they are guaranteed not to change as long as there is a
>      live reference to them."
>
> Something like that might save you from having to answer this
> question yet again at some time in the future.

Thanks, I'll add something like that to the docs.

> And, a last point:  for some purposes, instead of:
>
>      keep_alive = list(root.iter())
>
> the following might be better:
>
>      keep_alive = set(root.iterdescendants())
>      keep_alive.add(root)
>
> because:
>
> 1. iterdescendents() plus adding root puts all nodes into
>     keep_alive.

Then that shouldn't be any different from
(Continue reading)

Dave Kuhlman | 1 Apr 2010 23:26

Re: Temporary data attached to custom subclasses

On Thu, Apr 01, 2010 at 11:14:08PM +0200, Stefan Behnel wrote:
> Dave Kuhlman, 01.04.2010 22:41:
> >
> >      keep_alive = set(root.iterdescendants())
> >      keep_alive.add(root)
> >
> > because:
> >
> > 1. iterdescendents() plus adding root puts all nodes into
> >     keep_alive.
> 
> Then that shouldn't be any different from
> 
>      keep_alive = set(root.iter())
> 
> The only reason why there *is* an iterdescendants() is that iter() yields 
> all nodes in the subtree, including the root itself.
> 

Stefan -

You are right.  My mistake.  I thought I had done a test with
iter(), but I must have confused myself somehow.

- Dave

--

-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman
(Continue reading)

Stefan Behnel | 5 Apr 2010 10:41
Picon
Favicon

Re: lxml iterparse generator not returning anything

Joe Sarre, 30.03.2010 16:53:
> I'm finding that when using iterparse, the generator always throws StopIteration immediately, without
returning any data.

The only comment I can give on this one is that I've never seen this 
before. I'd first try to make sure it's really not a problem in your setup.

Stefan
Wichert Akkerman | 5 Apr 2010 10:52
Gravatar

Re: adding a namespace

On 3/23/10 20:09 , Stefan Behnel wrote:
> Simon showed you a way, but apart from that, it's a missing feature.
> Changing namespace mappings is nothing that the ElementTree API needs to
> care about, and lxml clearly lacks a good way to do it.
>
> Could you file a ticket on the bug tracker? This should be doable for 2.3.

Most certainly: https://bugs.launchpad.net/lxml/+bug/555602

Wichert.
Wichert Akkerman | 5 Apr 2010 10:58
Gravatar

downloads-a-plenty on launchpad page?

I just noticed https://launchpad.net/lxml/ really likes you to download 
the lxml 2.2 release. So much in fact it has that download listed 129 
times. I suspect that isn't intentional? :)

Wichert.
Wichert Akkerman | 5 Apr 2010 11:00
Gravatar

Re: downloads-a-plenty on launchpad page?

On 4/5/10 10:58 , Wichert Akkerman wrote:
> I just noticed https://launchpad.net/lxml/ really likes you to download
> the lxml 2.2 release. So much in fact it has that download listed 129
> times. I suspect that isn't intentional? :)

At least it is consistent: looking at 
https://launchpad.net/lxml/+download this appears to happen for all lxml 
releases.

Wichert.
Sidnei da Silva | 6 Apr 2010 20:28
Picon
Gravatar

Re: downloads-a-plenty on launchpad page?

On Mon, Apr 5, 2010 at 6:00 AM, Wichert Akkerman <wichert <at> wiggy.net> wrote:
> On 4/5/10 10:58 , Wichert Akkerman wrote:
>> I just noticed https://launchpad.net/lxml/ really likes you to download
>> the lxml 2.2 release. So much in fact it has that download listed 129
>> times. I suspect that isn't intentional? :)
>
> At least it is consistent: looking at
> https://launchpad.net/lxml/+download this appears to happen for all lxml
> releases.

Seems like it only happens for lxml. I brought it up with the
Launchpad team, they are looking into it.

-- Sidnei
Stephen Graham | 6 Apr 2010 21:41

installing lxml on MacOS

I am trying to go down the learning curve on lxml.

I tried to follow the install instructions to install lxml on my MacOS

As root, I did:
STATIC_DEPS=true easy_install lxml

The install chugged away, but eventually failed:

...

Undefined symbols for architecture i386:
   "_gzdirect", referenced from:
       ___xmlParserInputBufferCreateFilename in libxml2.a(xmlIO.o)
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
Undefined symbols for architecture ppc:
   "_gzdirect", referenced from:
       ___xmlParserInputBufferCreateFilename in libxml2.a(xmlIO.o)
ld: symbol(s) not found for architecture ppc
collect2: ld returned 1 exit status
lipo: can't open input file: /var/tmp//ccsmIZu8.out (No such file or  
directory)
make[2]: *** [xmllint] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
Traceback (most recent call last):
   File "/usr/bin/easy_install", line 8, in <module>
     load_entry_point('setuptools==0.6c7', 'console_scripts',  
'easy_install')()
(Continue reading)

David Lindquist | 7 Apr 2010 18:56
Favicon
Gravatar

parse timeout

Hello,

I have to parse a series of URLs, some of which might hang for an unacceptable length of time. I cannot figure
out how to add a timeout:

import socket
from lxml.html import parse

socket.setdefaulttimeout(10)
doc = parse('http://example.com/hang_for_a_long_time') # this might hang indefinitely

Is there some other way to add a timeout, short of recreating the parse function using urllib2?

Thanks,

David Lindquist

Gmane