Re: Unicode munging in element tag and text
Stefan Behnel <behnel_ml <at> gkec.informatik.tu-darmstadt.de>
2006-09-12 16:23:39 GMT
Hi Nicola,
Nicola Larosa wrote:
> This inconsistent behavior does not seem intentional. In my opinion, in the
> cases 1) and 2) lxml should work as it already does in the case 3), and as
> ElementTree always does.
At least under Python 2.x, lxml.etree will continue to return unicode or plain
strings depending on their content. Internally, everything is stored as UTF-8,
so this is for performance reasons as we can avoid unicode conversion for
plain ASCII strings (which are very common, just think of numeric data, dates,
etc.).
This may change in Python 3.x, but then, there may be more to change, so
that's not in our scope for now.
Stefan