Nathan Reynolds | 8 Feb 15:46
Picon
Gravatar

Custom element classes: lost my proxy

Hi all,

As soon as I insert my custom elements into a regular lxml.etree.Element, they revert to the standard Element interface.

Is it possible to get my proxy back for these elements?

Thanks,
Nath
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 8 Feb 15:50
Picon

Re: Custom element classes: lost my proxy


Nathan Reynolds, 08.02.2010 15:46:

> As soon as I insert my custom elements into a regular lxml.etree.Element, > they revert to the standard Element interface. > > Is it possible to get my proxy back for these elements?
http://codespeak.net/lxml/element_classes.html#generating-xml-with-custom-classes HTH, Stefan
Antti Kaihola | 8 Feb 11:11
Picon

Typo in exception message

Hi,


I'm catching exceptions from cssselect and my code needs to make decisions based on not only exception classes but also the particular exception messages.

The word "pseudo" is mistyped as "psuedo" in four different exceptions. Are these typos going to be kept unchanged for good, or should my code be prepared to match a corrected version as well?

The same typo seems to occur on the lxml web pages as well, by the way.


Regards,

Antti Kaihola
Espoo, Finland

_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 8 Feb 12:04
Picon

Re: Typo in exception message


Antti Kaihola, 08.02.2010 11:11:
> I'm catching exceptions from cssselect and my code needs to make decisions
> based on not only exception classes but also the particular exception
> messages.

Could you provide an insight into your use case here? I wouldn't mind
adding some machine-readable information to the exceptions, in case that
helps. (patches appreciated)

> The word "pseudo" is mistyped as "psuedo" in four different exceptions. Are
> these typos going to be kept unchanged for good, or should my code be
> prepared to match a corrected version as well?
> 
> The same typo seems to occur on the lxml web pages as well, by the way.

Thanks for catching that. I fixed it in the code and the docs. At least
lxml 2.3 will have this change.

Stefan
Martin Aspeli | 6 Feb 12:46
Picon

Copying children including text nodes

Hi,

I have two trees that were parsed with the HTML parser. The source tree is:

<html>
<head>
<body>
     Foo
     <p>Bar</p>
     Baz
</body>
</html>

The target is:

<html>
<head>
<body>
     <div id="target">Placeholder</div>
</body>
</html>

Now, I want to replace the whole of <div id="target"> tag (so, the tag 
and its children) with the *contents* of the <body> tag in the source 
tree. I obviously don't want the body tag itself.

Performance is important. Also, I don't care about the source tree after 
I'm done, so if "moving" rather than copying makes things faster/easier, 
that's OK.

What's the best way to do this? My naive approach was to do this:

sourceBody = source.find('body')
for sourceBodyChild in sourceBody:
	targetPlaceholder.addnext(sourceBodyChild)
targetPlaceholder.getparent().remove(targetPlaceholder)

However, this loses the text ("Foo"). I guess this is one case where 
dealing with text nodes explicitly would actually be better. :)

Martin

--

-- 
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book
Stefan Behnel | 8 Feb 14:41
Picon

Re: Copying children including text nodes


Martin Aspeli, 06.02.2010 12:46:
> I have two trees that were parsed with the HTML parser. The source tree is:
> 
> <html>
> <head>
> <body>
>      Foo
>      <p>Bar</p>
>      Baz
> </body>
> </html>
> 
> The target is:
> 
> <html>
> <head>
> <body>
>      <div id="target">Placeholder</div>
> </body>
> </html>
> 
> Now, I want to replace the whole of <div id="target"> tag (so, the tag 
> and its children) with the *contents* of the <body> tag in the source 
> tree. I obviously don't want the body tag itself.

parent.replace() doesn't currently support sequence insertion, but I would
expect this to work:

    prev = div_element.getprevious()
    if prev is None:
        target_body[:1] = source_body[:]
        target_body.text = source_body.text # take care of existing text?
    else:
        pos = target_body.index(div_element)
        target_body[pos:pos+1] = source_body[:]
        if prev.tail:
            prev.tail += source_body.text
        else:
            prev.tail = source_body.text

> Performance is important. Also, I don't care about the source tree after 
> I'm done, so if "moving" rather than copying makes things faster/easier, 
> that's OK.

Moving is certainly faster than copying, as copying does at least the same
amount of work, plus the memory allocations. If copying was required, you
could always do a deepcopy of the source content before inserting it.

I can't give any further comments on performance, though. You'll need to do
your own benchmarks (although I'm always interested in the results :)

Stefan
Picon

Get the xml-stylesheet processing instruction

Hi,


I had a hard time tracking this info down (only figured out after reading the thread at http://codespeak.net/pipermail/lxml-dev/2006-September/001903.html), so posted a recipe:

Feel free to use or copy for any purpose.

Thanks,
David

--
David Chandek-Stark
dchandekstark (at) gmail (dot) com
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Emanuele D'Arrigo | 5 Feb 21:02
Picon

Re: Get the xml-stylesheet processing instruction

On 5 February 2010 18:38, David Chandek-Stark <dchandekstark <at> gmail.com> wrote:
I had a hard time tracking this info down (only figured out after reading the thread at http://codespeak.net/pipermail/lxml-dev/2006-September/001903.html), so posted a recipe:


True, obtaining PIs (and comments) is not exactly intuitive nor straightforward (in, fact for head-of-file PIs it's straightbackward!! =) ). I guess ideally there could be a list of them available via read-only properties on the tree object, i.e.: docTree.PIs and docTree.comments.

Manu
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
John Krukoff | 5 Feb 21:58
Favicon

Re: Get the xml-stylesheet processing instruction

On Fri, 2010-02-05 at 20:02 +0000, Emanuele D'Arrigo wrote:
<snip>

> True, obtaining PIs (and comments) is not exactly intuitive nor
> straightforward (in, fact for head-of-file PIs it's straightbackward!!
> =) ). I guess ideally there could be a list of them available via
> read-only properties on the tree object, i.e.: docTree.PIs and
> docTree.comments.
> 
> Manu

I agree, although my version of ideally would be for the ElementTree class 
to do a better imitation of being a root node, so that something simple 
like this would work for looping over all root level nodes:

>>> from lxml import etree
>>> x = etree.XML( '<!--Comment--><?PI?><root/>' )
>>> t = x.getroottree()
>>> list( t )
[<!--Comment-->, <?PI?>, <Element root at 872f66c>]

Or if ElementTree even supported just getchildren() to retrieve the same
data. Even more ideally ;) it'd support insert/append/replace/remove and
company for editing such root level elements. Oh yeah, and if
xpath( '/' ) returned said extended ElementTree as the root node, as
long as I'm wishing for ponies and unicorns.

I suppose the reason it's hard is because effbot's ElementTree hasn't
ever dealt with the issue of non-element root level contents.

--

-- 
John Krukoff <jkrukoff <at> ltgc.com>
Land Title Guarantee Company
jholg | 4 Feb 00:33
Picon
Picon

Re: [Bug 488222] Feature request: add better schematron support to lxml



> > This speaks for pulling the result accessor into the Schematron class, > probably as a class attribute that can be overridden on an instance level. > > > > The same might make sense for the iso-schematron implementation xsl > transformation steps. > > Sounds like a much better interface. Any interesting global options would > be better overridden by subtyping the validator class, so class attributes > make sense to me.
Committed to trunk: https://codespeak.net/viewvc/?view=rev&revision=71090 This simply exposes the skeleton xslt steps and the validation result xpath as class attributes. I consider the iso-schematron works pretty much finished for now... Holger -- -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Richard Baron Penman | 31 Jan 05:06
Picon

ElementTree 1.3a xpath position broken?

hello,

I am after xpath support for an application running on Google App Engine, which unfortunately rules out lxml. 
According to this document (http://effbot.org/zone/element-xpath.htm) the development version of ElementTree 1.3a has additional support for xpath, which covers my use cases.


From my tests I found attributes and child nodes work:

>>> from elementtree import ElementTree
>>> tree = ElementTree.fromstring('<a><b></b><b><c class="test"></c></b></a>')
>>> print list(tree.findall('.//*[ <at> class="test"]'))
[<Element 'c' at b7caa0cc>]
>>> print list(tree.findall('.//b[c]'))
[<Element 'b' at b7caa08c>]


However tag positions appear to be broken:
>>> print list(tree.findall('.//b[1]')) # should return b element
[] 


Have I missed something? Suggestions?

regards,
Richard
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev

Gmane