Martijn Faassen | 8 Jan 2005 11:11
Favicon

test mail

Hm,

Yesterday's mail didn't seem to have arrived, trying a test email.

Regards,

Martijn
Martijn Faassen | 8 Jan 2005 11:12
Favicon

lxml.etree (ElementTree reimplementation) is getting there

Hi there,

[resending this, as yesterday's mail never seems to have arrived]

The lxml.etree implementation of ElementTree, on top of libxml2, is
getting there now. It features automatic memory management and quite a
bit of ElementTree compatibility. Not all of the ElementTree API has
been implemented yet, but enough for many use cases.

I did discover in the process of debugging that you need a recent
version of libxml2 to make it all work without memory errors; apparently
earlier ones, like the version in my debian unstable (2.6.11), contain
some bugs still.

I'm testing with libxml2 version 2.6.11 myself, so you may want to try
that one too if you want to play with this code. You'll have to modify
setup.py to make it use your installation of libxml2 -- the three
variables to modify are on the top.

So, check out out (svn co http://codespeak.net/lxml/trunk lxml), compile
it, and do a 'make test'. And tell me whether the tests pass on your
machine!

Regards,

Martijn
Martijn Faassen | 14 Jan 2005 19:15
Favicon

lxml progress

Hi there,

Since some people seem to be actually reading this and some progress has 
been made, I thought I'd give a report of what's been happening with lxml.

* Since last week, I've added a lot more of the ElementTree API, such as
   the .find() function and friends, by directly using the code from
   ElementTree.

* I actually am running the ElementTree and cElementTree test suites
   now. I still need to disable some tests, but a significant fraction is
   indeed running.

* I've improved the way libxml2's parser functionality gets used, in
   order to implement libxml2's top-level parse() function.

* I've added XPath support to lxml.etree! An example of what you can do:

 >>> from lxml import etree
 >>> tree = etree.parse('ot.xml')
 >>> tree.xpath('(//v)[5]/text()')
[u'And God called the light Day, and the darkness he called Night. And 
the evening and the morning were the first day.\n']

or, say, this, modifying the elements returned:

 >>> result = tree.xpath('(//v)[5]')
 >>> result[0].text = 'The day and night verse.'
 >>> tree.xpath('(//v)[5]/text()')
[u'The day and night verse.']
(Continue reading)

Marc-Antoine Parent | 20 Jan 2005 19:24
Picon

lxml XPath

>> I have started looking at what is missing for me to use this in my 
>> own projects, and the only big piece I could find missing for my 
>> purposes was XPathRegisterFunction. So I am starting to hack at it. 
>> (I know Python well, libxml2 somewhat though mostly as a user, and I 
>> am new to Pyrex. Great opportunity to learn, I love it so far.)
>
> Ah, great, a contributor! :)

!!! I am sure those are always scarce. I cannot put tons of time 
either, but I am looking around at this point.

First, I (finally) realized that the XPath support is in Python and not 
that of libxml. Oupse! I had misguided expectations to the contrary. I 
suggest you add this to the TODO.

Your previous mails seem to indicate that this is a temporary 
situation. What are your plans exactly? And why did you start out this 
way? Did you look at interfacing with XPaths and see obvious problems 
that I do not know about?
Otherwise, I would tend to say that the first thing to do is to declare 
a new XPathParserContext at xmlDoc creation, attach it to the xmlDoc 
instance, and use it for all later queries. Does that make sense to 
you? I am trying to look at the pitfalls, here.

Marc-Antoine Parent
Marc-Antoine Parent | 21 Jan 2005 05:07
Picon

Some thoughts re XPath extension functions

OK, I have a first (barely) functional implementation of 
registerXPathExtensionFunc. (I sent it to Martijn, but it's not ready 
for checkin until some collective thinking happens.)

For one thing, I currently collect functions globally, and register 
them with the new XPathContext that is created on demand every time the 
xpath function is called. That operation is unfortunately slow.
I think that there are reasons that XPathContext are created on demand: 
If your extension function calls the xpath method, I think it is 
necessary to use  a new context. (Needs checking.) But then that means 
re-registering extensions, of course...

I first thought of keeping a XPathContext with the xmlDoc. It would 
save creation time, for one thing. But it means that I have to guard 
against the recursion problems above. (Yes, I have done such things, it 
is a real situation.) Also, more care to guard against leaks. Do people 
here think it worth the trouble?

And I also thought that the extension functions should be registered 
with the document, and not globally. Do people agree this is a good 
thing? More complicated in some ways, but it would allow different 
documents to have different extension functions registered. Is this 
useful in real life? I cannot think of a use case. I would definitely 
like feedback on this issue.

Another alternative would be to make people manipulate XPathContext 
explicitely, and provide it (as an optional argument?) when calling 
XPaths functions. I think that is ugly, and again I cannot see use 
cases for using two distinct sets of functions on a single document. 
Anybody disagrees?
(Continue reading)

Kapil Thangavelu | 21 Jan 2005 07:15

Re: Some thoughts re XPath extension functions


On Jan 20, 2005, at 10:07 PM, Marc-Antoine Parent wrote:

> OK, I have a first (barely) functional implementation of 
> registerXPathExtensionFunc. (I sent it to Martijn, but it's not ready 
> for checkin until some collective thinking happens.)
>

very cool!

> For one thing, I currently collect functions globally, and register 
> them with the new XPathContext that is created on demand every time 
> the xpath function is called. That operation is unfortunately slow.
> I think that there are reasons that XPathContext are created on 
> demand: If your extension function calls the xpath method, I think it 
> is necessary to use  a new context. (Needs checking.) But then that 
> means re-registering extensions, of course...
>
> I first thought of keeping a XPathContext with the xmlDoc. It would 
> save creation time, for one thing. But it means that I have to guard 
> against the recursion problems above. (Yes, I have done such things, 
> it is a real situation.) Also, more care to guard against leaks. Do 
> people here think it worth the trouble?
>

dunno.

> And I also thought that the extension functions should be registered 
> with the document, and not globally. Do people agree this is a good 
> thing? More complicated in some ways, but it would allow different 
(Continue reading)

Marc-Antoine Parent | 21 Jan 2005 14:25
Picon

Re: Some thoughts re XPath extension functions

>> And I also thought that the extension functions should be registered 
>> with the document, and not globally. Do people agree this is a good 
>> thing? More complicated in some ways, but it would allow different 
>> documents to have different extension functions registered. Is this 
>> useful in real life? I cannot think of a use case. I would definitely 
>> like feedback on this issue.
>
> i've been working on putting together an xsl engine in zope, i 
> originally went with pyana/xalan for this very reason, the ability to 
> have non global xpath extensions. as to give the extension functions, 
> access to a zope request context ( basically an http request) needed 
> access from a global perspective which was tricky, as well as 
> conditional availablility of certain functions based on that context. 
> i've since rewritten the engine ( since pyana doesn't allow for 
> returning nodesets from ext functions) to use libxml/libxslt and play 
> lots of thread local storage games to get access to the context (and 
> manage the global error handlers).
>
> anyways, i'd like to see the capability of non global registration of 
> extension functions, and i think the above is a valid use case, but 
> the lack thereof can be worked around. one abstraction that pyana has 
> that i like a lot is that of a reusable transformer object analagous 
> where functions, and transform aspects can be set and reused against a 
> given set of stylesheet transforms.
>
>> Another alternative would be to make people manipulate XPathContext 
>> explicitely, and provide it (as an optional argument?) when calling 
>> XPaths functions.
>
> thats interesting.. if the xpathcontext is document stored, then they 
(Continue reading)

Kapil Thangavelu | 23 Jan 2005 12:05

Re: Some thoughts re XPath extension functions


On Jan 21, 2005, at 7:25 AM, Marc-Antoine Parent wrote:

>>> And I also thought that the extension functions should be registered  
>>> with the document, and not globally. Do people agree this is a good  
>>> thing? More complicated in some ways, but it would allow different  
>>> documents to have different extension functions registered. Is this  
>>> useful in real life? I cannot think of a use case. I would  
>>> definitely like feedback on this issue.
>>
>> i've been working on putting together an xsl engine in zope, i  
>> originally went with pyana/xalan for this very reason, the ability to  
>> have non global xpath extensions. as to give the extension functions,  
>> access to a zope request context ( basically an http request) needed  
>> access from a global perspective which was tricky, as well as  
>> conditional availablility of certain functions based on that context.  
>> i've since rewritten the engine ( since pyana doesn't allow for  
>> returning nodesets from ext functions) to use libxml/libxslt and play  
>> lots of thread local storage games to get access to the context (and  
>> manage the global error handlers).
>>
>> anyways, i'd like to see the capability of non global registration of  
>> extension functions, and i think the above is a valid use case, but  
>> the lack thereof can be worked around. one abstraction that pyana has  
>> that i like a lot is that of a reusable transformer object analagous  
>> where functions, and transform aspects can be set and reused against  
>> a given set of stylesheet transforms.
>>
>>> Another alternative would be to make people manipulate XPathContext  
>>> explicitely, and provide it (as an optional argument?) when calling  
(Continue reading)

Marc-Antoine Parent | 23 Jan 2005 14:56
Picon

Re: Some thoughts re XPath extension functions

>> I doubt I can optimize around that, but I get the impression that, 
>> for the same document and/or stylesheet, the extension functions 
>> would always be the same functions, though they might need access to 
>> data that varies per-call. Is that right?
>
> yes.
>
>>  So maybe if we could somehow define access to a user-data parameter 
>> within the extension functions... Maybe from the python wrapper 
>> around the XPathParserContext parameter... But that also complicates 
>> the API, which is very much what lxml is working against. Still, it 
>> might be easier than exposing XPathContext manipulations in the API. 
>> Would you agree that is so?
>
> implmentation wise it might be easier but as a better overall 
> approach, not really..  just to be clear,  your saying lxml should use 
> a userdata parameter as a workaround against global extension 
> functions needing local context, instead of local functions? ideally 
> for an easier api, i think a separate abstraction of a transformer to 
> wrap xpath parsing and transform context and functions, with an api 
> for registering local xpath functions would  be a step forward.

Yes... Since that last letter, I was leaning more and more convinced 
that exposing the XPathContext was not a bad idea after all. We could 
declare a XPathContext, with extension functions and namespaces that 
would look as two dictionaries on the XPathContext.
That said, I am much less sure after looking at the xslt extension 
elements... The XSLT extension API is very different from the libxml 
XPath extension API, in that it uses global registration of modules. 
(There is registration against a transform context, but that seems 
(Continue reading)

Martijn Faassen | 24 Jan 2005 18:16
Favicon

Re: Some thoughts re XPath extension functions

Marc-Antoine Parent wrote:
[snip]
> Yes... Since that last letter, I was leaning more and more convinced 
> that exposing the XPathContext was not a bad idea after all. We could 
> declare a XPathContext, with extension functions and namespaces that 
> would look as two dictionaries on the XPathContext.

Dropping into the middle of things, and having not followed most of this 
discussion yet (I haven't had the time yet!), I was considering exposing 
a separate XPath object, like I have a RelaxNG and XSLT object already. 
the xpath method could then be implemented in terms of this XPath 
object, and would just be a convenience thing.

The XPath object might have an XPathContext inside and you can
indeed register namespaces and functions and so on on it.

Sorry if I'm saying something obvious or obviously wrong. :)

Regards,

Martijn

Gmane