8 Jan 2005 11:11
8 Jan 2005 11:12
lxml.etree (ElementTree reimplementation) is getting there
Martijn Faassen <faassen <at> infrae.com>
2005-01-08 10:12:10 GMT
2005-01-08 10:12:10 GMT
Hi there, [resending this, as yesterday's mail never seems to have arrived] The lxml.etree implementation of ElementTree, on top of libxml2, is getting there now. It features automatic memory management and quite a bit of ElementTree compatibility. Not all of the ElementTree API has been implemented yet, but enough for many use cases. I did discover in the process of debugging that you need a recent version of libxml2 to make it all work without memory errors; apparently earlier ones, like the version in my debian unstable (2.6.11), contain some bugs still. I'm testing with libxml2 version 2.6.11 myself, so you may want to try that one too if you want to play with this code. You'll have to modify setup.py to make it use your installation of libxml2 -- the three variables to modify are on the top. So, check out out (svn co http://codespeak.net/lxml/trunk lxml), compile it, and do a 'make test'. And tell me whether the tests pass on your machine! Regards, Martijn
14 Jan 2005 19:15
lxml progress
Martijn Faassen <faassen <at> infrae.com>
2005-01-14 18:15:45 GMT
2005-01-14 18:15:45 GMT
Hi there,
Since some people seem to be actually reading this and some progress has
been made, I thought I'd give a report of what's been happening with lxml.
* Since last week, I've added a lot more of the ElementTree API, such as
the .find() function and friends, by directly using the code from
ElementTree.
* I actually am running the ElementTree and cElementTree test suites
now. I still need to disable some tests, but a significant fraction is
indeed running.
* I've improved the way libxml2's parser functionality gets used, in
order to implement libxml2's top-level parse() function.
* I've added XPath support to lxml.etree! An example of what you can do:
>>> from lxml import etree
>>> tree = etree.parse('ot.xml')
>>> tree.xpath('(//v)[5]/text()')
[u'And God called the light Day, and the darkness he called Night. And
the evening and the morning were the first day.\n']
or, say, this, modifying the elements returned:
>>> result = tree.xpath('(//v)[5]')
>>> result[0].text = 'The day and night verse.'
>>> tree.xpath('(//v)[5]/text()')
[u'The day and night verse.']
(Continue reading)
20 Jan 2005 19:24
lxml XPath
Marc-Antoine Parent <maparent <at> mac.com>
2005-01-20 18:24:02 GMT
2005-01-20 18:24:02 GMT
>> I have started looking at what is missing for me to use this in my >> own projects, and the only big piece I could find missing for my >> purposes was XPathRegisterFunction. So I am starting to hack at it. >> (I know Python well, libxml2 somewhat though mostly as a user, and I >> am new to Pyrex. Great opportunity to learn, I love it so far.) > > Ah, great, a contributor! :) !!! I am sure those are always scarce. I cannot put tons of time either, but I am looking around at this point. First, I (finally) realized that the XPath support is in Python and not that of libxml. Oupse! I had misguided expectations to the contrary. I suggest you add this to the TODO. Your previous mails seem to indicate that this is a temporary situation. What are your plans exactly? And why did you start out this way? Did you look at interfacing with XPaths and see obvious problems that I do not know about? Otherwise, I would tend to say that the first thing to do is to declare a new XPathParserContext at xmlDoc creation, attach it to the xmlDoc instance, and use it for all later queries. Does that make sense to you? I am trying to look at the pitfalls, here. Marc-Antoine Parent
21 Jan 2005 05:07
Some thoughts re XPath extension functions
Marc-Antoine Parent <maparent <at> mac.com>
2005-01-21 04:07:44 GMT
2005-01-21 04:07:44 GMT
OK, I have a first (barely) functional implementation of registerXPathExtensionFunc. (I sent it to Martijn, but it's not ready for checkin until some collective thinking happens.) For one thing, I currently collect functions globally, and register them with the new XPathContext that is created on demand every time the xpath function is called. That operation is unfortunately slow. I think that there are reasons that XPathContext are created on demand: If your extension function calls the xpath method, I think it is necessary to use a new context. (Needs checking.) But then that means re-registering extensions, of course... I first thought of keeping a XPathContext with the xmlDoc. It would save creation time, for one thing. But it means that I have to guard against the recursion problems above. (Yes, I have done such things, it is a real situation.) Also, more care to guard against leaks. Do people here think it worth the trouble? And I also thought that the extension functions should be registered with the document, and not globally. Do people agree this is a good thing? More complicated in some ways, but it would allow different documents to have different extension functions registered. Is this useful in real life? I cannot think of a use case. I would definitely like feedback on this issue. Another alternative would be to make people manipulate XPathContext explicitely, and provide it (as an optional argument?) when calling XPaths functions. I think that is ugly, and again I cannot see use cases for using two distinct sets of functions on a single document. Anybody disagrees?(Continue reading)
21 Jan 2005 07:15
Re: Some thoughts re XPath extension functions
Kapil Thangavelu <hazmat <at> objectrealms.net>
2005-01-21 06:15:22 GMT
2005-01-21 06:15:22 GMT
On Jan 20, 2005, at 10:07 PM, Marc-Antoine Parent wrote: > OK, I have a first (barely) functional implementation of > registerXPathExtensionFunc. (I sent it to Martijn, but it's not ready > for checkin until some collective thinking happens.) > very cool! > For one thing, I currently collect functions globally, and register > them with the new XPathContext that is created on demand every time > the xpath function is called. That operation is unfortunately slow. > I think that there are reasons that XPathContext are created on > demand: If your extension function calls the xpath method, I think it > is necessary to use a new context. (Needs checking.) But then that > means re-registering extensions, of course... > > I first thought of keeping a XPathContext with the xmlDoc. It would > save creation time, for one thing. But it means that I have to guard > against the recursion problems above. (Yes, I have done such things, > it is a real situation.) Also, more care to guard against leaks. Do > people here think it worth the trouble? > dunno. > And I also thought that the extension functions should be registered > with the document, and not globally. Do people agree this is a good > thing? More complicated in some ways, but it would allow different(Continue reading)
21 Jan 2005 14:25
Re: Some thoughts re XPath extension functions
Marc-Antoine Parent <maparent <at> mac.com>
2005-01-21 13:25:16 GMT
2005-01-21 13:25:16 GMT
>> And I also thought that the extension functions should be registered >> with the document, and not globally. Do people agree this is a good >> thing? More complicated in some ways, but it would allow different >> documents to have different extension functions registered. Is this >> useful in real life? I cannot think of a use case. I would definitely >> like feedback on this issue. > > i've been working on putting together an xsl engine in zope, i > originally went with pyana/xalan for this very reason, the ability to > have non global xpath extensions. as to give the extension functions, > access to a zope request context ( basically an http request) needed > access from a global perspective which was tricky, as well as > conditional availablility of certain functions based on that context. > i've since rewritten the engine ( since pyana doesn't allow for > returning nodesets from ext functions) to use libxml/libxslt and play > lots of thread local storage games to get access to the context (and > manage the global error handlers). > > anyways, i'd like to see the capability of non global registration of > extension functions, and i think the above is a valid use case, but > the lack thereof can be worked around. one abstraction that pyana has > that i like a lot is that of a reusable transformer object analagous > where functions, and transform aspects can be set and reused against a > given set of stylesheet transforms. > >> Another alternative would be to make people manipulate XPathContext >> explicitely, and provide it (as an optional argument?) when calling >> XPaths functions. > > thats interesting.. if the xpathcontext is document stored, then they(Continue reading)
23 Jan 2005 12:05
Re: Some thoughts re XPath extension functions
Kapil Thangavelu <hazmat <at> objectrealms.net>
2005-01-23 11:05:18 GMT
2005-01-23 11:05:18 GMT
On Jan 21, 2005, at 7:25 AM, Marc-Antoine Parent wrote: >>> And I also thought that the extension functions should be registered >>> with the document, and not globally. Do people agree this is a good >>> thing? More complicated in some ways, but it would allow different >>> documents to have different extension functions registered. Is this >>> useful in real life? I cannot think of a use case. I would >>> definitely like feedback on this issue. >> >> i've been working on putting together an xsl engine in zope, i >> originally went with pyana/xalan for this very reason, the ability to >> have non global xpath extensions. as to give the extension functions, >> access to a zope request context ( basically an http request) needed >> access from a global perspective which was tricky, as well as >> conditional availablility of certain functions based on that context. >> i've since rewritten the engine ( since pyana doesn't allow for >> returning nodesets from ext functions) to use libxml/libxslt and play >> lots of thread local storage games to get access to the context (and >> manage the global error handlers). >> >> anyways, i'd like to see the capability of non global registration of >> extension functions, and i think the above is a valid use case, but >> the lack thereof can be worked around. one abstraction that pyana has >> that i like a lot is that of a reusable transformer object analagous >> where functions, and transform aspects can be set and reused against >> a given set of stylesheet transforms. >> >>> Another alternative would be to make people manipulate XPathContext >>> explicitely, and provide it (as an optional argument?) when calling(Continue reading)
23 Jan 2005 14:56
Re: Some thoughts re XPath extension functions
Marc-Antoine Parent <maparent <at> mac.com>
2005-01-23 13:56:15 GMT
2005-01-23 13:56:15 GMT
>> I doubt I can optimize around that, but I get the impression that, >> for the same document and/or stylesheet, the extension functions >> would always be the same functions, though they might need access to >> data that varies per-call. Is that right? > > yes. > >> So maybe if we could somehow define access to a user-data parameter >> within the extension functions... Maybe from the python wrapper >> around the XPathParserContext parameter... But that also complicates >> the API, which is very much what lxml is working against. Still, it >> might be easier than exposing XPathContext manipulations in the API. >> Would you agree that is so? > > implmentation wise it might be easier but as a better overall > approach, not really.. just to be clear, your saying lxml should use > a userdata parameter as a workaround against global extension > functions needing local context, instead of local functions? ideally > for an easier api, i think a separate abstraction of a transformer to > wrap xpath parsing and transform context and functions, with an api > for registering local xpath functions would be a step forward. Yes... Since that last letter, I was leaning more and more convinced that exposing the XPathContext was not a bad idea after all. We could declare a XPathContext, with extension functions and namespaces that would look as two dictionaries on the XPathContext. That said, I am much less sure after looking at the xslt extension elements... The XSLT extension API is very different from the libxml XPath extension API, in that it uses global registration of modules. (There is registration against a transform context, but that seems(Continue reading)
24 Jan 2005 18:16
Re: Some thoughts re XPath extension functions
Martijn Faassen <faassen <at> infrae.com>
2005-01-24 17:16:09 GMT
2005-01-24 17:16:09 GMT
Marc-Antoine Parent wrote: [snip] > Yes... Since that last letter, I was leaning more and more convinced > that exposing the XPathContext was not a bad idea after all. We could > declare a XPathContext, with extension functions and namespaces that > would look as two dictionaries on the XPathContext. Dropping into the middle of things, and having not followed most of this discussion yet (I haven't had the time yet!), I was considering exposing a separate XPath object, like I have a RelaxNG and XSLT object already. the xpath method could then be implemented in terms of this XPath object, and would just be a convenience thing. The XPath object might have an XPathContext inside and you can indeed register namespaces and functions and so on on it. Sorry if I'm saying something obvious or obviously wrong. :) Regards, Martijn
RSS Feed