3 Dec 2011 09:41
Re: Attribute encoding.
Stefan Behnel <stefan_ml <at> behnel.de>
2011-12-03 08:41:03 GMT
2011-12-03 08:41:03 GMT
Evgeny Turnaev, 28.11.2011 12:01: > Is there any reason why attributes of Element returned as bytestring > if only contains ascii? Yes. Partly for ElementTree compatibility and partly because it's faster and more memory friendly under Python 2.x. Also note that it's not just attribute names and values. All string values work this way in lxml. > In my application i need it to be unicode always. In Python 3, lxml will always give you Unicode strings. In Python 2, ASCII encoded byte strings are compatible with the equivalent Unicode strings (as long as the platform default encoding is ASCII-compatible, which is "normally" the case), so you will rarely notice the difference in your code. > Is there a way to force lxml return element attribute as unicode? No. > What is the preferred way of getting attributes as unicode? If you really need a unicode string in Py2, you can do "unicode(value)" or "u'' + value". Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml <at> lxml.de(Continue reading)

RSS Feed