Re: lxml removes line breaks from XML attributes
Stefan Behnel <stefan_ml <at> behnel.de>
2011-03-05 06:31:44 GMT
[accidentally sent to the wrong list]
Giovanni Torres, 04.03.2011 10:35:
> Some of you might not be happy with my question. But, I'm dealing with
> an XML file that has line breaks in XML attributes. I use lxml to
> parse the file, run some XPath queries, make changes to it and write
> it back. Unfortunately, lxml removes the line breaks from the
> attributes.
>
> Here is what I mean more clearly:
>
> $ cat example.xml
> <example attr="This is an attribute with several
> break
> lines"/>
>
> $ cat test.py
>
> import sys
> import lxml.etree
>
> xml = lxml.etree.parse(sys.stdin)
> xml.write(sys.stdout)
> print()
>
> $ python test.py< example.xml
> <example attr="This is an attribute with several break lines"/>()
This is called "attribute-value normalisation" in the XML spec:
http://www.w3.org/TR/REC-xml/#AVNormalize
> Is there any way I can get lxml to write those line breaks back?
You should escape the newlines in attribute values as presented in the
spec, i.e. use "#xA;" etc.
> I'm actually not sure if they are even legal. But, they seem to be
> according to this:
>
> http://stackoverflow.com/questions/449627/are-line-breaks-in-xml-attribute-values-valid
Well, technically, the example is "legal", as stated, but it doesn't give
the requested result.
Stefan