Martin Mueller | 17 Apr 15:15 2014

lxml on Mavericks

Thank you, Marius, for your advice, which worked.

Stefan, would it make sense to add something like the following to your
documentation?

If you have different versions of Python on your machine, the simple
command 'pip' will by default look for the version that came with your
system. Use a more specific command such as

	pip3.4 install lxml

to install lxml in a particular version of Python on your machine. Look in
/usr/local/bin for available versions of the pip installer.

There is a bug in Mavericks that may prevent the installation of lxml. As
a workaround, run the command

  	export CFLAGS=-Qunused-arguments

before running 

	STATIC_DEPS=true pip install lxml

For a discussion of the bug see  http://bugs.python.org/issue21244
https://github.com/python-imaging/Pillow/issues/527

Martin Mueller
Professor emeritus of English and Classics

_________________________________________________________________
(Continue reading)

Martin Mueller | 17 Apr 04:46 2014

installing lxml to run with python3.4 on OS Mavericks

I have tried to install lxml on my laptop, which runs OS X Mavericks. Not
much success, and from scanning the Web, I'm not alone.

There are actually two issues here. First, I want to install lxml, and
second, I want it to run with Python 3.4, which I successfully installed
on my computer. 

I used the install routine recommended in Building lxml in Mac OS X and
used the command 

STATIC_DEPS=true pip install lxml

From the log file I gather that everything runs more or less as expected
until it hits a glitch towards the very end, which I reproduce below. I
don't really understand them, but I note that the whole installation
process is geared towards the 2.7 version of Python that is part of the
system. I haven't found instructons on how to force pip to look for Python
3.4. Perhaps it doesn't matter.

I'll be grateful for help. On my desktop Mac (Lion) I managed to associate
an earlier version of lxml with Python3, but I don't remember how I did
it. 

copying 
/Users/martin/build/lxml/build/tmp/libxml2/include/libxslt/xsltInternals.h
-> build/lib.macosx-10.9-intel-2.7/lxml/includes/libxslt

copying 
/Users/martin/build/lxml/build/tmp/libxml2/include/libxslt/xsltlocale.h ->
build/lib.macosx-10.9-intel-2.7/lxml/includes/libxslt
(Continue reading)

Максим Кочкин | 15 Apr 20:33 2014
Picon

lxml.html.clean vulnerability

Hi, guys.

I've accidentally found vulnerability in clean_html function. User can break schema of url with nonprinted chars (\x01-\x08). Here is PoC.


from lxml.html.clean import clean_html

html = '''\
<html>
<body>
<a href="javascript:alert(0)">aaa</a>
<a href="javas\x01cript:alert(1)">bbb</a>
<a href="javas\x02cript:alert(1)">bbb</a>
<a href="javas\x03cript:alert(1)">bbb</a>
<a href="javas\x04cript:alert(1)">bbb</a>
<a href="javas\x05cript:alert(1)">bbb</a>
<a href="javas\x06cript:alert(1)">bbb</a>
<a href="javas\x07cript:alert(1)">bbb</a>
<a href="javas\x08cript:alert(1)">bbb</a>
<a href="javas\x09cript:alert(1)">bbb</a>
</body>
</html>'''

print clean_html(html)


Output:

<div>
<body>
<a href="">aaa</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="">bbb</a>
</body>
</div>


I'm not a python programmer, so can't give you quick fix. Found it by blackbox testing on one site that uses lxml. I'm not sure if it's bug or maybe I just got things wrong.

----
ksimka ( <at> m_ksimka)
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Mark Grandi | 4 Apr 23:39 2014
Picon

XML Creation - streaming output

Hello,

I have been very happy with lxml so far, so thanks again for maintaining this for so long! However, there is a use case that lxml does not provide, and i'm not sure if its a limitation of libxml2 or not, but while there is a streaming parser for xml, there is no such thing for outputting / generating xml. As a result, generating a very large XML file is completely dependent on having a quite large amount of computer memory which many people (like me) don't have!

Is there some hidden api in lxml, or maybe an api in libxml2 (that hasn't been made available for lxml) that accomplishes this?

Thanks!

~mark
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 4 Apr 07:22 2014
Picon

Re: Local installation issue

Ivan Pozdeev, 03.04.2014 23:54:
>> KM Here And There, 02.04.2014 23:59:
>>> Following up:
>>>
>>> Jens' note caused me to look somewhere OTHER than:
>>> https://github.com/lxml/lxml (which is where the documentation points
>>> you to).
>>>
>>> And the distribution at
>>> https://pypi.python.org/packages/source/l/lxml/lxml-3.3.3.tar.gz#md5=f2675837b4358a5ecab5fd9a783fd0e5
>>> seems to have the right stuff.
>>>
>>> I think the page at http://lxml.de/build.html is confusing in this
>>> regard and may need a bit of clarification so no one else does what I
>>> did.  Or maybe I'm just a dumb noob.
> 
>> Hmm, it's actually *very* explicit, although that also makes it a bit
>> verbose. Suggestions for improvements welcome.
> 
> 1) Can we get "source code" of these pages so we can suggest patches right off the bat?

https://github.com/lxml/lxml

Specifically, INSTALL.txt and doc/build.txt.

Pull requests welcome.

> 2) "Static linking on Windows":
> 
> 2.1) Replace the section's content with
> "run with --static-deps", possibly abbreviate the former
> content into a brief explanation of what the option does.

Right. I updated it.

> 2.2) Add the "insert http://msinttypes.googlecode.com/svn/trunk/stdint.h to VC9.0's include dir"
> fix that I mentioned in
https://mailman-mail5.webfaction.com/pipermail/lxml/2014-January/007065.html .

That shouldn't be necessary anymore.

Thanks for the comments.

Stefan

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 3 Apr 22:13 2014
Picon

lxml 3.3.4 released

Hi all,

I'm happy to announce the release of lxml 3.3.4. This is a bug-fix release
for the stable lxml 3.3 series that adds one little feature: full line
number support (beyond 65535) when using libxml2 2.9.x.

The documentation is here: http://lxml.de/

Download:  http://lxml.de/files/lxml-3.3.4.tgz

Signature: http://lxml.de/files/lxml-3.3.4.tgz.asc

Changelog: http://lxml.de/3.3/changes-3.3.4.html

Github:
https://github.com/lxml/lxml/commit/076efc798ee7eae048d9ee764f30e2980a7c870f

This release was built using Cython 0.20.1.

If you are interested in commercial support or customisations for the lxml
package, please contact me directly.

Have fun,

Stefan

3.3.4 (2014-04-03)
==================

Features added
--------------

* Source line numbers above 65535 are available on Elements when
  using libxml2 2.9 or later.

Bugs fixed
----------

* lxml.html.fragment_fromstring() failed for bytes input in Py3.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
KM Here And There | 2 Apr 22:31 2014
Picon

Local installation issue

I'm trying to build a local copy of lxml on an Ubuntu x86-64 system. 
I'm not doing lxml development, just need the package.

The documentation for building on lxml.de says that *all* I need to type is:

    python setup.py build --without-cython

The results below show that there is a file missing from the
distribution.  If the file is supposed to be generated, the
documentation does not say how this is accomplished.

kevin <at> ubuntu:~/build/lxml-lxml-3.3$ python setup.py build --without-cython
Building lxml version 3.3.3.
WARNING: Trying to build without Cython, but pre-generated
'src/lxml/lxml.etree.c' is not available.
WARNING: Trying to build without Cython, but pre-generated
'src/lxml/lxml.objectify.c' is not available.
Building without Cython.
Using build configuration of libxslt 1.1.28
Building against libxml2/libxslt in the following directory:
/home/kevin/usr/local/lib
/home/kevin/usr/local/lib/python2.7/distutils/dist.py:267: UserWarning:
Unknown distribution option: 'bugtrack_url'
  warnings.warn(msg)
running build
running build_py
copying src/lxml/includes/lxml-version.h ->
build/lib.linux-i686-2.7/lxml/includes
running build_ext
building 'lxml.etree' extension
gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
-I/home/kevin/usr/local/include
-I/home/kevin/usr/local/include/python2.7
-I/home/kevin/usr/local/include/libxml2
-I/home/kevin/usr/local/include/libexslt
-I/home/kevin/usr/local/include/libxslt -I/home/kevin/usr/local/include
-fPIC -I/home/kevin/usr/local/include
-I/home/kevin/usr/local/include/libxml2
-I/home/kevin/build/lxml-lxml-3.3/src/lxml/includes
-I/home/kevin/usr/local/include/python2.7 -c src/lxml/lxml.etree.c -o
build/temp.linux-i686-2.7/src/lxml/lxml.etree.o -w
gcc: error: src/lxml/lxml.etree.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 4

So it appears that the file 'src/lxml/lxml.etree.c' is missing from the
source distribution.  I've checked 3.3.3 back to 2.3 and it has NEVER
been there AFAICT.

My question is, where do I get this file or how do I generate it, or is
this a bug?

Thanks,
Kevin

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Markus Schöpflin | 19 Mar 09:25 2014

lxml not ignoring whitespace when validating xsd:int

Hello,

I have asked this already on stackoverflow but figured that I'd better should 
be asking here.

The following python script contains a simple XML schema defining an element 
'a' of integer type and an XML document containing such an element. When 
validating the document against the schema the validation fails.

---%<---
from lxml import etree
from StringIO import StringIO

xmlschema = etree.XMLSchema(etree.parse(StringIO('''\
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:element name="a" type="xsd:int"/>
</xsd:schema>
''')))

xmldoc = etree.parse(StringIO("<a> 42</a>"))

print xmlschema.validate(xmldoc)
--->%---

According to XML Schema Part 2: Datatypes Second Edition, section 4.3.6 all 
atomic data types other than 'string' have their 'whiteSpace' constraint set 
to 'collapse', so I think the element 'a' should be valid.

Am I mistaken or is this a bug? I have found a similar issue on S/O regarding 
the atomic type dateTime which unfortunately has not solution up to now.

Regards,
Markus

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Herm Fischer | 17 Mar 19:58 2014

need XML_PARSE_BIG_LINES parser option?

Hi,

For XBRL usage we routinely have humongous files well over millions of 
lines, and I think we need the XML_PARSE_BIG_LINES libxml parser option 
so we don't get line numbers crunched to short ints.

Other than recompiling from source, is there a way to expose this option 
(or just use it for everybody)?  (End users don't like the thought of 
custom-hacked lxml.)

    Herm Fischer (for Arelle project)
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 10 Mar 20:11 2014
Picon

Announcing lxml's end of support for Python 2.4 and 2.5

Hi everyone,

given that systems using Python 2.4/5 are becoming truly rare these days
(even Py2.6 officially died last year), I've decided to reduce my own
maintenance burden by letting the next lxml release series (the one after
3.3) no longer support these two versions and require Py2.6, Py2.7, Py3.2
or later.

I'm also dropping support for Py3.1 because I doubt that supporting it is
interesting for any reasonable number of people. Lots of other major Python
packages haven't even been ported to anything below Py3.3, so being stuck
with Py3.1 can't be fun anyway.

Hope you don't mind,

Stefan
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
terencenwk . | 7 Mar 18:40 2014
Picon

How can I use open_http in lxml.html.submit_form to alternate opener?

How can I use open_http in lxml.html.submit_form to alternate opener?

I do not know how to fix the last line of code.

from lxml.html import parse, submit_form fundurl = 'http://www.aia.com.hk/en/investment-information/fund_search_content_new.jsp?fund=c04&tier=sp_br&todate=&date=&name=' page = parse(fundurl).getroot() form = page.forms[0] form.fields['date'] = '08/09/2004' form.fields['todate'] = '3m' #### THE FOLLOWING CODE IS INCORRECT #### result = parse(submit_form(form, {'submit':'Search'}, open_http('GET', 'http://www.aia.com.hk/en/investment-information/fund_search_content_new.jsp?fund=c04&tier=sp_br&todate=&date=&name=', {'date':'08/19/2004','todate':'3m'})))
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane