Stefan Behnel | 1 Jul 2006 14:53
Picon

Re: let lxml write the ?xml pi

Hi Albert,

Albert Brandl wrote:
> I started using lxml some weeks ago, and have been lurking on the
> mailing list for some time now. Recently I had the problem that the xml
> prologue is not included by default, and stumbled over the following
> mail:
> 
> On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote:
>> I.e., try the following:
>>
>> etree.tostring(t, 'utf-8', xml_declaration=True)
> 
> Is there any reason that the method write_c14n() does not support this
> flag? The canonical form is a bit more readable, therefore I'd prefer
> to use this method.

As the documentation of the write_c14n() method states, it always writes UTF-8
encoded byte streams, so there is no real need for the prologue. I wouldn't
mind adding this, though. Things like 'standalone' and the XML version would
otherwise not be available in the output.

BTW, if it's about the readability, pretty printing might be closer to what
you want anyway.

Stefan
Albert Brandl | 11 Jul 2006 17:57
Picon
Favicon

Re: let lxml write the ?xml pi

Hi Stefan,

On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
> As the documentation of the write_c14n() method states, it always writes UTF-8
> encoded byte streams, so there is no real need for the prologue. I wouldn't
> mind adding this, though. Things like 'standalone' and the XML version would
> otherwise not be available in the output.

I recently learned about section 4.1 of the C14N recommendation,
http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical
form does not contain a prologue. Therefore, write_c14n() is ok - sorry
for the request.

> BTW, if it's about the readability, pretty printing might be closer to what
> you want anyway.

Thanks for the hint. In lxml 1.0.1, the pretty printed version adds
information about the namespace to every tag. Unfortunately, this
decreases the readibility, since in my case, almost all tags have a
namespace. A "pretty_print" flag for write_c14n() would be a 
perfect workaround, though :-)

Best regards,

Albert
Stefan Behnel | 11 Jul 2006 18:34
Picon

Re: let lxml write the ?xml pi

Hi Albert,

Albert Brandl wrote:
> On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
>> As the documentation of the write_c14n() method states, it always writes UTF-8
>> encoded byte streams, so there is no real need for the prologue. I wouldn't
>> mind adding this, though. Things like 'standalone' and the XML version would
>> otherwise not be available in the output.
> 
> I recently learned about section 4.1 of the C14N recommendation,
> http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical
> form does not contain a prologue. Therefore, write_c14n() is ok - sorry
> for the request.

Thought so. Thanks for checking.

>> BTW, if it's about the readability, pretty printing might be closer to what
>> you want anyway.
> 
> Thanks for the hint. In lxml 1.0.1, the pretty printed version adds
> information about the namespace to every tag.

Not on my side. How do you build the tree?

> Unfortunately, this
> decreases the readibility, since in my case, almost all tags have a
> namespace. A "pretty_print" flag for write_c14n() would be a 
> perfect workaround, though :-)

I don't think that's gonna happen. C14N is meant to be a well-defined XML
(Continue reading)

Gerald John M. Manipon | 13 Jul 2006 08:26
Picon
Picon
Favicon

tostring() escapes and adding cdata section

Hi,

Quick question:  How can I prevent the escaping (specifically '&' into
'&' that occurs when I use tostring()?

i.e.

 >>> from lxml.etree import *
 >>> r = Element('root')
 >>> s = SubElement(r,'sub')
 >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
 >>> s.text
'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
 >>> tostring(s)
'<sub>http://test/cgi-bin/test.cgi?a=123.2&amp;b=asdfe&amp;b="3asd"</sub>'

I'm currently just doing a .replace('&amp;','&') on the string I get
back.

Also, is there a way to specify that an element's text should be
enclosed as a CDATA?

Thanks for any help.

Gerald
Stefan Behnel | 13 Jul 2006 08:35
Picon

Re: tostring() escapes and adding cdata section

Hi Gerald,

Gerald John M. Manipon wrote:
> Quick question:  How can I prevent the escaping (specifically '&' into
> '&amp;' that occurs when I use tostring()?

You (obviously) can't. The output would not be (well-formed) XML.

> I'm currently just doing a .replace('&amp;','&') on the string I get
> back.

You can use 'unescape' from the xml.sax.saxutils module.
http://docs.python.org/lib/module-xml.sax.saxutils.html

But why don't you use lxml itself? Unescaping is done automatically when you
parse the string.

>  >>> from lxml.etree import *
>  >>> r = Element('root')
>  >>> s = SubElement(r,'sub')
>  >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>  >>> s.text
> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>  >>> tostring(s)
> '<sub>http://test/cgi-bin/test.cgi?a=123.2&amp;b=asdfe&amp;b="3asd"</sub>'

So? What's the problem? That's perfect XML. Any XML parser will be able to
handle that.

> Also, is there a way to specify that an element's text should be
(Continue reading)

Gerald John M. Manipon | 13 Jul 2006 10:09
Picon
Picon
Favicon

Re: tostring() escapes and adding cdata section


Stefan Behnel wrote:
> Hi Gerald,
> 
> Gerald John M. Manipon wrote:
>> Quick question:  How can I prevent the escaping (specifically '&' into
>> '&amp;' that occurs when I use tostring()?
> 
> You (obviously) can't. The output would not be (well-formed) XML.

Okay.

> 
> 
>> I'm currently just doing a .replace('&amp;','&') on the string I get
>> back.
> 
> You can use 'unescape' from the xml.sax.saxutils module.
> http://docs.python.org/lib/module-xml.sax.saxutils.html

I'll look into that.

> 
> But why don't you use lxml itself? Unescaping is done automatically when you
> parse the string.
> 
> 
>>  >>> from lxml.etree import *
>>  >>> r = Element('root')
>>  >>> s = SubElement(r,'sub')
(Continue reading)

Stefan Behnel | 13 Jul 2006 10:45
Picon

Re: tostring() escapes and adding cdata section

Hi Gerald,

Gerald John M. Manipon wrote:
> We're posting our xml that we get from tostring() to
> one of our partner's web services (I don't know the exact backend but
> it looks Java-based) and their services do not like the '&amp;'.  I
> guess it's a problem on their end.

Oh, definitely:
http://www.w3.org/TR/2004/REC-xml-20040204/#syntax

"""
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in
their literal form, except when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are needed
elsewhere, they MUST be escaped using either numeric character references or
the strings "&amp;" and "&lt;" respectively.
"""

If it doesn't work for them, they should start using an XML parser (which is
the best choice for parsing XML anyway...)

Stefan
Stefan Behnel | 14 Jul 2006 21:24
Picon

a C-level API for lxml

Hi all,

as part of a project on lxml, I'm building an external element API module
(objectify style) as a Pyrex extension. To make this independent of lxml
itself, I decided to add an external C-level API that allows external modules
to efficiently interface with the lxml module. Usage in other modules will be
as easy as including a header file or cimporting a .pxd file in Pyrex, and
then calling an init function from the external module. The match is done by
comparing char* strings for the function names at initialisation time, so this
is pretty future proof (no missing symbols when the C API changes etc.).

This requires some changes in Pyrex, so lxml 1.1 will depend on a patched
version (again), until (one day) my patches are accepted upstream. I also
published some Python 2.5 related fixes, BTW, to make lxml 1.1 run nicely on
Python 2.5. I can't currently test that since I can't get the 2.5 beta
versions to work on my machine (broken compiled-in PYTHONPATH). Anyway, at
least I got positive feedback that the exception stuff seems to be fixed. The
Py_ssize_t fixes are not verified on 2.5, but should also work.

A preliminary version of the patched Pyrex is here:
http://codespeak.net/svn/lxml/pyrex/
So, if someone could test lxml with it under 2.5 (preferably on a 64-bit
machine) ...

When the lxml C-API is in place, it will be easy to add new functions to it
(basically by adding the "public" keyword to a Pyrex C function). So I'd be
glad if everyone who thinks this API would be useful for them could propose
more functions to be made public. I know specifically that Andreas had a
problem with extending the XPath implementation, so maybe there are ways to
get this solved at the C level. This thread is the right place to discuss
(Continue reading)

Petr van Blokland | 16 Jul 2006 00:34
Favicon

Re: Python values in xpath functions

Hi, 
may be someone can get me out.
I am returning an etree from a Python function in XPath.
But it does not seem to work stepping through the result
as in <xsl:for-each select="myfunction()/*">...</xsl:for-each>
where <xsl:for-each select="*">...</xsl:for-each>works fine 
for the current node. What do I do wrong. Should the function
answer something different from an etree, as in:

def myfunction(dummy, *args):
... # create etree from args
return etree

Kind regards,
Petr van Blokland

----------------------------------------------
Petr van Blokland
buro <at> petr.com | www.petr.com | +31 15 219 10 40 
----------------------------------------------



_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 16 Jul 2006 09:47
Picon

Re: Python values in xpath functions

Hi Petr,

Petr van Blokland wrote:
> I am returning an etree from a Python function in XPath.

"etree" is the name of the module. I guess you mean an ElementTree object?

> But it does not seem to work stepping through the result
> as in <xsl:for-each select="myfunction()/*">...</xsl:for-each>
> where <xsl:for-each select="*">...</xsl:for-each>works fine 
> for the current node. What do I do wrong.

Don't return an ElementTree (don't you get an exception for that anyway?).
Return an Element or a list of Elements.

Stefan

Gmane