jens quade | 16 Oct 13:40 2014
Picon

XMLParser mode resolve_entities=False and entities in attributes

Hi,

this has been discussed before in 11/2009, but the bug seems to persist, so I will try to document it again:

If an XML parser is generated with XMLParser(resolve_entities=False), and the document used declares an
external DTD, then entities in attributes are inserted into the parent element (if a parent element
exists) directly before the element containing that attribute.

Expected behaviour:

- an error, because entities are undeclared; or, more useful in some cases:
- Entities stay in their attributes

Workarounds:

- Declare an internal DTD that defines all entities

- Use an actual external DTD *and* use dtd_validation=True with XMLParser

Sample code: (see also: http://pastebin.com/24bM98La -- some more examples there)

from lxml import etree
parser = etree.XMLParser(resolve_entities=False)

try:
   tree = etree.XML("""<test>1<a href="&uuml;bel">&ouml;</a></test>""", parser=parser)
except etree.XMLSyntaxError as e:
   print e

Output:
(Continue reading)

Pegerto Fernández | 3 Oct 12:44 2014
Picon

Issue writing with 3.4 at python 3.2 at windows

Hello Team,

I try to use lxml to perform a c14n over a document

        c14ndoc = io.StringIO()
        tree.write_c14n(c14ndoc)

I am having this issue:

    tree.write_c14n(c14ndoc)
  File "lxml.etree.pyx", line 2271, in lxml.etree._ElementTree.write_c14n (src\lxml\lxml.etree.c:60874)
  File "serializer.pxi", line 592, in lxml.etree._tofilelikeC14N (src\lxml\lxml.etree.c:124249)
  File "lxml.etree.pyx", line 316, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:10291)
  File "serializer.pxi", line 407, in lxml.etree._FilelikeWriter.write (src\lxml\lxml.etree.c:121770)
TypeError: string argument expected, got 'bytes'

Do you have any idea ?
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Kun JIN | 30 Sep 10:07 2014
Picon

How to use result of nsmap in xpath

hello,

"nsmap" is an lxml.element function to get namespaces of the element, 
usually, a None type will be returned, like:

{'olac': 'http://www.language-archives.org/OLAC/1.1/', None: 
'http://www.imsglobal.org/xsd/imscp_v1p1'}

i want use this namespaces in XPath, but i got an error:
TypeError: empty namespace prefix is not supported in XPath

so, i must delete this NoneType by creating a new dictionay in Python:

dic_ns = {}
for element in root.nsmap:
     if element is None: continue
     dic_ns[element] = root.nsmap[element]

QUESTION: is there some other methods to do this?

Thank you in advance,

Bests,

Kun

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Charlie Clark | 24 Sep 11:45 2014
Picon

Best way of dealing with empty attributes

Hi,

we often have the case where element attributes may turn out to be None.  
This leads to an error in lxml when creating an Element, or serialising it  
in ElementTree. Is it possible to configure the behaviour is such  
situations? Either to ignore the attribute or just create it with no value?

Charlie
--

-- 
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
tcqwerty | 23 Sep 23:48 2014

possible bug: parse() and fromstring() have different results

Basically, I would expect ET.parse(StringIO(test_string)) and
ET.fromstring(test_string) to return the same data. Instead, parse()
returns an ElementTree, and fromstring() returns an Element.
----
from StringIO import StringIO
from lxml import etree as ET
test_string = """<root>data</root>"""
tree1 = ET.parse(StringIO(test_string))
tree2 = ET.fromstring(test_string)
print ET.tostring(tree1), type(tree1)
print ET.tostring(tree2), type(tree2)
----
Output:
<root>data</root> <type 'lxml.etree._ElementTree'>
<root>data</root> <type 'lxml.etree._Element'>

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Patrick Asselman | 21 Sep 22:53 2014
Picon

(no subject)

Can somebody please shed some light on what is going on here?

Lxml seems to be installed according to pip:

$ pip freeze
lxml==3.4.0

But for some reason it is not found by Python:

$ python3
Python 3.4.0 (v3.4.0:04f714765c13, Mar 15 2014, 23:02:41) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
ImportError: No module named 'lxml'
>>>

Also help('modules') doesn't show lxml in the list.

This is on OSX 10.9.5

Before this, I install py34-lxml through macports and it said "Installing py34-lxml  <at> 3.3.5_0", which
made me think it was installing an older version, so I chose to uninstall and use pip instead. Pip completes
the installation succesfully, but the module cannot be found by Python. Any suggestions?

Best regards, 
Patrick
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Charlie Clark | 19 Sep 20:23 2014
Hi,

I'm trying to build lxml from source on Mac OS (it builds fine as a
dependency) but I seem to be hitting a wall:

Trying to build without Cython, but pre-generated 'src/lxml/lxml.etree.c'

I get this whether or not I'm using Cython (it's installed) or python
setup.py build --static-deps

libxml2 and libxlst are installed.

Charlie
--

-- 
Charlie Clark
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-938-5360
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Burak Arslan | 16 Sep 10:12 2014
Picon

missing void elements

Hello,

Where's the lxml issue tracker? I couldn't find it?

Void elements are html tags that don't need a closing tag. <br> is a
well known example:

>>> from lxml.builder import E
>>> html.tostring(E.br())
'<br>'
>>> etree.tostring(E.br())
'<br/>'
>>>

<p> is not a void element, so:

>>> html.tostring(E.p())
'<p></p>'
>>> etree.tostring(E.p())
'<p/>'

see the full list:

http://www.w3.org/TR/html5/syntax.html#elements-0

While working on etree.htmlfile, I noticed that the following tags are
not treated as void:

embed, keygen, source, track, wbr

see: https://github.com/lxml/lxml/pull/142#issuecomment-55588559

because of this, lxml can produce invalid html.

once this issue is resolved, the tuple at:

https://github.com/lxml/lxml/commit/c039e03798d84ac3f897354f0780f15346da1361#diff-4eda3aa1c784f7867d1cb9fc5e041282R344

needs to be updated.

best,
burak

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 10 Sep 19:19 2014
Picon

lxml 3.4.0 released

Hi all,

I just released the final version of lxml 3.4, with no code changes since
the last beta. This is a feature release that mostly cleans up some prior
deficiencies and speeds up parsing for documents that contain XML-IDs.

Note that this release drops support for older Python versions and now
requires Py2.6/7 or Py3.2+. It also removes support for very old versions
of libxml2 and libxslt (<=2008).

The documentation is here: http://lxml.de/

Download:  http://lxml.de/files/lxml-3.4.0.tgz

Signature: http://lxml.de/files/lxml-3.4.0.tgz.asc

Changelog: http://lxml.de/3.4/changes-3.4.0.html

Github:
https://github.com/lxml/lxml/blob/14505bc62f5f1fc9fb0ff007955f3e67ab4562bb

This release was built using Cython 0.21, but should also build fine with
0.20.x.

If you are interested in commercial support or customisations for the lxml
package, please contact me directly.

Have fun,

Stefan

3.4.0 (2014-09-10)
==================

Features added
--------------

* ``xmlfile(buffered=False)`` disables output buffering and flushes the
  content after each API operation (starting/ending element blocks or
  writes). A new method ``xf.flush()`` can alternatively be used to
  explicitly flush the output.

* ``lxml.html.document_fromstring`` has a new option
  ``ensure_head_body=True`` which will add an empty head and/or body
  element to the result document if missing.

* ``lxml.html.iterlinks`` now returns links inside meta refresh tags.

* New ``XMLParser`` option ``collect_ids=False`` to disable ID hash table
  creation.  This can substantially speed up parsing of documents with many
  different IDs that are not used.

* The parser uses per-document hash tables for XML IDs.  This reduces the
  load of the global parser dict and speeds up parsing for documents with
  many different IDs.

* ``ElementTree.getelementpath(element)`` returns a structural ElementPath
  expression for the given element, which can be used for lookups later.

* ``xmlfile()`` accepts a new argument ``close=True`` to close file(-like)
  objects after writing to them.  Before, ``xmlfile()`` only closed the
  file if it had opened it internally.

* Allow "bytearray" type for ASCII text input.

Other changes
-------------

* LP#400588: decoding errors have become hard errors even in recovery mode.
  Previously, they could lead to an internal tree representation in a mixed
  encoding state, which lead to very late errors or even silently incorrect
  behaviour during tree traversal or serialisation.

* Requires Python 2.6, 2.7, 3.2 or later. No longer supports
  Python 2.4, 2.5 and 3.1, use lxml 3.3.x for those.

* Requires libxml2 2.7.0 or later and libxslt 1.1.23 or later,
  use lxml 3.3.x with older versions.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Charlie Clark | 10 Sep 12:17 2014
Picon

Comparing XML requires unicode?

Hi,

last year Stefan very kindly showed me how to use LXMLOutputchecker to  
compare XML trees. This is a lifesaver if you generate a lot of XML and  
want to check it: we're using it extensively in openpyxl.

But recently I've found myself bashing my head against it repeatedly as it  
seems to work with unicode only, which means I can't simply use  
compare_xml(tostring(tree), expected) - wrapper function from
https://bitbucket.org/openpyxl/openpyxl/src/03cb2a7f046d02ec3a19cbeba4375b6d6a19db73/openpyxl/tests/helper.py?at=default#cl-68

I can work around this using a helper function or lxml's handy  
tounicode(), except that we also need to run tests assuming lxml is not  
installed.

Am I missing something simple?

Charlie
--

-- 
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 5 Sep 15:35 2014
Picon

lxml 3.4.0 beta 1 released

Hi all,

I just released the first beta version of the upcoming lxml 3.4. This is a
feature release that mostly cleans up some prior deficiencies and speeds up
parsing for documents that contain XML-IDs. Please give it some testing
against your code.

Note that this release drops support for older Python versions and now
requires Py2.6/7 or Py3.2+. It also removes support for very old versions
of libxml2 and libxslt (<=2008).

The documentation is here: http://lxml.de/

Download:  http://lxml.de/files/lxml-3.4.0beta1.tgz

Signature: http://lxml.de/files/lxml-3.4.0beta1.tgz.asc

Changelog: http://lxml.de/3.4/changes-3.4.0beta1.html

Github:
https://github.com/lxml/lxml/commit/638b9ce006ba32e46a09101e15c93ee94649a2ae

This release was built using a pre-release version of Cython 0.21
(7a47dfdabcb9a9861480b1437f092c5f84911558). The final release is expected
to use Cython 0.21 (but should build just fine with 0.20.x).

If you are interested in commercial support or customisations for the lxml
package, please contact me directly.

Have fun,

Stefan

3.4.0beta1 (2014-09-05)
=======================

Features added
--------------

* ``xmlfile(buffered=False)`` disables output buffering and flushes the
  content after each API operation (starting/ending element blocks or
  writes). A new method ``xf.flush()`` can alternatively be used to
  explicitly flush the output.

* ``lxml.html.document_fromstring`` has a new option
  ``ensure_head_body=True`` which will add an empty head and/or body
  element to the result document if missing.

* ``lxml.html.iterlinks`` now returns links inside meta refresh tags.

* New ``XMLParser`` option ``collect_ids=False`` to disable ID hash table
  creation.  This can substantially speed up parsing of documents with many
  different IDs that are not used.

* The parser uses per-document hash tables for XML IDs.  This reduces the
  load of the global parser dict and speeds up parsing for documents with
  many different IDs.

* ``ElementTree.getelementpath(element)`` returns a structural ElementPath
  expression for the given element, which can be used for lookups later.

* ``xmlfile()`` accepts a new argument ``close=True`` to close file(-like)
  objects after writing to them.  Before, ``xmlfile()`` only closed the
  file if it had opened it internally.

* Allow "bytearray" type for ASCII text input.

Other changes
-------------

* LP#400588: decoding errors have become hard errors even in recovery mode.
  Previously, they could lead to an internal tree representation in a mixed
  encoding state, which lead to very late errors or even silently incorrect
  behaviour during tree traversal or serialisation.

* Requires Python 2.6, 2.7, 3.2 or later. No longer supports
  Python 2.4, 2.5 and 3.1, use lxml 3.3.x for those.

* Requires libxml2 2.7.0 or later and libxslt 1.1.23 or later,
  use lxml 3.3.x with older versions.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane