Felix Schwarz | 27 Oct 13:28 2014
Picon

building lxml with pypy: "pyconfig.h" not found

Hi,

I just tried to build lxml 3.4.0 using pypy 2.2.1 (on Fedora 20) build failed
due to a missing pyconfig.h.

Is that expected behavior? I see that there are passing travis builds for the
github version so it should be possible somehow.

I'm not sure where to start debugging the issue as it could be a lxml problem,
pypy error or Fedora packaging bug so I decided to start at the source :-)

I started out with the lxml 3.4.0 tar.gz from pypi because I tried to avoid
rebuilding all the cython stuff. Maybe that's the problem?

$ pypy setup.py build
Building lxml version 3.4.0.
Building without Cython.
Using build configuration of libxslt 1.1.28
Building against libxml2/libxslt in the following directory: /usr/lib64
/usr/lib64/pypy-2.2.1/lib-python/2.7/distutils/dist.py:267: UserWarning:
Unknown distribution option: 'bugtrack_url'
  warnings.warn(msg)
running build
running build_py
copying src/lxml/includes/lxml-version.h ->
build/lib.linux-x86_64-2.7/lxml/includes
running build_ext
building 'lxml.etree' extension
cc -O2 -fPIC -Wimplicit -I/usr/include/libxml2
-I/home/fs/code/szoska/fiverx/lxml-3.4.0/src/lxml/includes
(Continue reading)

Charlie Clark | 23 Oct 11:48 2014
Picon

Generating code from schema

Hi,

just a quick question about what you can and cannot do with lxml's schema  
support. In openpyxl we're moving towards a 1:1 implementation of the  
underlying schema. lxml.objectify isn't directly an option for two  
reasons: lxml is an optional dependency and there are cases where we'd  
definitely run out of memory. Instead we're using descriptors to enforce  
type definitions. This means a little more code but now that it seems to  
be working well I was thinking whether we could automate some of the  
process. I've looked at some of the existing XSD to Python generators but  
the generated code is far from what I'd like to have.

Can we use the lxml schema support for anything other than validation? ie.  
can I query a schema object for a particular definition? Or is the best  
approach to parse the XSD files directly and work through the definitions  
with a mapping?

Charlie
--

-- 
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
(Continue reading)

jens quade | 16 Oct 13:40 2014
Picon

XMLParser mode resolve_entities=False and entities in attributes

Hi,

this has been discussed before in 11/2009, but the bug seems to persist, so I will try to document it again:

If an XML parser is generated with XMLParser(resolve_entities=False), and the document used declares an
external DTD, then entities in attributes are inserted into the parent element (if a parent element
exists) directly before the element containing that attribute.

Expected behaviour:

- an error, because entities are undeclared; or, more useful in some cases:
- Entities stay in their attributes

Workarounds:

- Declare an internal DTD that defines all entities

- Use an actual external DTD *and* use dtd_validation=True with XMLParser

Sample code: (see also: http://pastebin.com/24bM98La -- some more examples there)

from lxml import etree
parser = etree.XMLParser(resolve_entities=False)

try:
   tree = etree.XML("""<test>1<a href="&uuml;bel">&ouml;</a></test>""", parser=parser)
except etree.XMLSyntaxError as e:
   print e

Output:
(Continue reading)

Pegerto Fernández | 3 Oct 12:44 2014
Picon

Issue writing with 3.4 at python 3.2 at windows

Hello Team,

I try to use lxml to perform a c14n over a document

        c14ndoc = io.StringIO()
        tree.write_c14n(c14ndoc)

I am having this issue:

    tree.write_c14n(c14ndoc)
  File "lxml.etree.pyx", line 2271, in lxml.etree._ElementTree.write_c14n (src\lxml\lxml.etree.c:60874)
  File "serializer.pxi", line 592, in lxml.etree._tofilelikeC14N (src\lxml\lxml.etree.c:124249)
  File "lxml.etree.pyx", line 316, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:10291)
  File "serializer.pxi", line 407, in lxml.etree._FilelikeWriter.write (src\lxml\lxml.etree.c:121770)
TypeError: string argument expected, got 'bytes'

Do you have any idea ?
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Kun JIN | 30 Sep 10:07 2014
Picon

How to use result of nsmap in xpath

hello,

"nsmap" is an lxml.element function to get namespaces of the element, 
usually, a None type will be returned, like:

{'olac': 'http://www.language-archives.org/OLAC/1.1/', None: 
'http://www.imsglobal.org/xsd/imscp_v1p1'}

i want use this namespaces in XPath, but i got an error:
TypeError: empty namespace prefix is not supported in XPath

so, i must delete this NoneType by creating a new dictionay in Python:

dic_ns = {}
for element in root.nsmap:
     if element is None: continue
     dic_ns[element] = root.nsmap[element]

QUESTION: is there some other methods to do this?

Thank you in advance,

Bests,

Kun

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Charlie Clark | 24 Sep 11:45 2014
Picon

Best way of dealing with empty attributes

Hi,

we often have the case where element attributes may turn out to be None.  
This leads to an error in lxml when creating an Element, or serialising it  
in ElementTree. Is it possible to configure the behaviour is such  
situations? Either to ignore the attribute or just create it with no value?

Charlie
--

-- 
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
tcqwerty | 23 Sep 23:48 2014

possible bug: parse() and fromstring() have different results

Basically, I would expect ET.parse(StringIO(test_string)) and
ET.fromstring(test_string) to return the same data. Instead, parse()
returns an ElementTree, and fromstring() returns an Element.
----
from StringIO import StringIO
from lxml import etree as ET
test_string = """<root>data</root>"""
tree1 = ET.parse(StringIO(test_string))
tree2 = ET.fromstring(test_string)
print ET.tostring(tree1), type(tree1)
print ET.tostring(tree2), type(tree2)
----
Output:
<root>data</root> <type 'lxml.etree._ElementTree'>
<root>data</root> <type 'lxml.etree._Element'>

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Patrick Asselman | 21 Sep 22:53 2014
Picon

(no subject)

Can somebody please shed some light on what is going on here?

Lxml seems to be installed according to pip:

$ pip freeze
lxml==3.4.0

But for some reason it is not found by Python:

$ python3
Python 3.4.0 (v3.4.0:04f714765c13, Mar 15 2014, 23:02:41) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
ImportError: No module named 'lxml'
>>>

Also help('modules') doesn't show lxml in the list.

This is on OSX 10.9.5

Before this, I install py34-lxml through macports and it said "Installing py34-lxml  <at> 3.3.5_0", which
made me think it was installing an older version, so I chose to uninstall and use pip instead. Pip completes
the installation succesfully, but the module cannot be found by Python. Any suggestions?

Best regards, 
Patrick
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Charlie Clark | 19 Sep 20:23 2014
Hi,

I'm trying to build lxml from source on Mac OS (it builds fine as a
dependency) but I seem to be hitting a wall:

Trying to build without Cython, but pre-generated 'src/lxml/lxml.etree.c'

I get this whether or not I'm using Cython (it's installed) or python
setup.py build --static-deps

libxml2 and libxlst are installed.

Charlie
--

-- 
Charlie Clark
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-938-5360
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Burak Arslan | 16 Sep 10:12 2014
Picon

missing void elements

Hello,

Where's the lxml issue tracker? I couldn't find it?

Void elements are html tags that don't need a closing tag. <br> is a
well known example:

>>> from lxml.builder import E
>>> html.tostring(E.br())
'<br>'
>>> etree.tostring(E.br())
'<br/>'
>>>

<p> is not a void element, so:

>>> html.tostring(E.p())
'<p></p>'
>>> etree.tostring(E.p())
'<p/>'

see the full list:

http://www.w3.org/TR/html5/syntax.html#elements-0

While working on etree.htmlfile, I noticed that the following tags are
not treated as void:

embed, keygen, source, track, wbr

see: https://github.com/lxml/lxml/pull/142#issuecomment-55588559

because of this, lxml can produce invalid html.

once this issue is resolved, the tuple at:

https://github.com/lxml/lxml/commit/c039e03798d84ac3f897354f0780f15346da1361#diff-4eda3aa1c784f7867d1cb9fc5e041282R344

needs to be updated.

best,
burak

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 10 Sep 19:19 2014
Picon

lxml 3.4.0 released

Hi all,

I just released the final version of lxml 3.4, with no code changes since
the last beta. This is a feature release that mostly cleans up some prior
deficiencies and speeds up parsing for documents that contain XML-IDs.

Note that this release drops support for older Python versions and now
requires Py2.6/7 or Py3.2+. It also removes support for very old versions
of libxml2 and libxslt (<=2008).

The documentation is here: http://lxml.de/

Download:  http://lxml.de/files/lxml-3.4.0.tgz

Signature: http://lxml.de/files/lxml-3.4.0.tgz.asc

Changelog: http://lxml.de/3.4/changes-3.4.0.html

Github:
https://github.com/lxml/lxml/blob/14505bc62f5f1fc9fb0ff007955f3e67ab4562bb

This release was built using Cython 0.21, but should also build fine with
0.20.x.

If you are interested in commercial support or customisations for the lxml
package, please contact me directly.

Have fun,

Stefan

3.4.0 (2014-09-10)
==================

Features added
--------------

* ``xmlfile(buffered=False)`` disables output buffering and flushes the
  content after each API operation (starting/ending element blocks or
  writes). A new method ``xf.flush()`` can alternatively be used to
  explicitly flush the output.

* ``lxml.html.document_fromstring`` has a new option
  ``ensure_head_body=True`` which will add an empty head and/or body
  element to the result document if missing.

* ``lxml.html.iterlinks`` now returns links inside meta refresh tags.

* New ``XMLParser`` option ``collect_ids=False`` to disable ID hash table
  creation.  This can substantially speed up parsing of documents with many
  different IDs that are not used.

* The parser uses per-document hash tables for XML IDs.  This reduces the
  load of the global parser dict and speeds up parsing for documents with
  many different IDs.

* ``ElementTree.getelementpath(element)`` returns a structural ElementPath
  expression for the given element, which can be used for lookups later.

* ``xmlfile()`` accepts a new argument ``close=True`` to close file(-like)
  objects after writing to them.  Before, ``xmlfile()`` only closed the
  file if it had opened it internally.

* Allow "bytearray" type for ASCII text input.

Other changes
-------------

* LP#400588: decoding errors have become hard errors even in recovery mode.
  Previously, they could lead to an internal tree representation in a mixed
  encoding state, which lead to very late errors or even silently incorrect
  behaviour during tree traversal or serialisation.

* Requires Python 2.6, 2.7, 3.2 or later. No longer supports
  Python 2.4, 2.5 and 3.1, use lxml 3.3.x for those.

* Requires libxml2 2.7.0 or later and libxslt 1.1.23 or later,
  use lxml 3.3.x with older versions.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane