There appears to be a bug with lxml.sax's handling of comments, as the following code causes lxml.sax.saxify to fail:
"""
import lxml.etree
, lxml.sax, xml.sax.handler
from cStringIO import StringIO
p = lxml.etree.HTMLParser(remove_blank_text=True)
h = xml.sax.handler.ContentHandler()
f = StringIO("<body><!-- foo --><p>bar</p></body>")
t = lxml.etree.parse(f, p)
lxml.sax.saxify(t, h)
"""
"""
Traceback (most recent call last):
File "saxBug.py", line 11, in <module>
lxml.sax.saxify(t, h)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-
1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 178, in saxify
return ElementTreeProducer(element_or_tree, content_handler).saxify()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-
1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 130, in saxify
self._recursive_saxify(self._element, {})
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-
1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 160, in _recursive_saxify
self._recursive_saxify(child, prefixes)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-
1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 160, in _recursive_saxify
self._recursive_saxify(child, prefixes)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-
1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 149, in _recursive_saxify
ns_uri, local_name = _getNsTag(element.tag)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-
1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 8, in _getNsTag
if tag[0] == '{':
TypeError: 'builtin_function_or_method' object is unsubscriptable
"""
I have been able to replicate the above error with both release and svn lxml, as well as with both Apple-supplied libxml2/libxslt and up-to-date libraries.
Also, and I doubt this is related, but `make test` fails for me on OS X 10.4.9 with MacPython 2.5.1 (
python.org binary):
"""
python test.py -p -v
TESTED VERSION:
Python: (2, 5, 1, 'final', 0)
lxml.etree
: (1, 3, -1, 42667)
libxml used: (2, 6, 28)
libxml compiled: (2, 6, 28)
libxslt used: (1, 1, 20)
libxslt compiled: (1, 1, 20)
733/733 (100.0%): Doctest: xpathxslt.txt
======================================================================
FAIL: test_module_HTML_unicode (
lxml.tests.test_htmlparser.HtmlParserTestCaseBase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run
testMethod()
File "/Users/erik/Projects/lxml/src/lxml/tests/test_htmlparser.py", line 33, in test_module_HTML_unicode
self.uhtml_str)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 334, in failUnlessEqual
(msg or '%r != %r' % (first, second))
AssertionError: u'<html><head><title>test \xc3\x83\xc2\xa1\xef\xa3\x92</title></head><body><h1>page \xc3\x83\xc2\xa1\xef\xa3\x92 title</h1></body></html>' != u'<html><head><title>test \xc3\xa1\uf8d2</title></head><body><h1>page \xc3\xa1\uf8d2 title</h1></body></html>'
----------------------------------------------------------------------
Ran 733 tests in 1.380s
FAILED (failures=1)
"""
--
Erik Swanson