Burak Arslan | 22 Jun 12:32 2016
Picon

htmlfile doesn't escape attribute values

Hello,

This is a heads-up for bug 1594155 here: https://bugs.launchpad.net/lxml/+bug/1594155

Please consider the following test case.

>>> from lxml import html
>>> from lxml.etree import htmlfile
>>> from lxml.html.builder import E
>>> from StringIO import StringIO
>>> out = StringIO()
>>> with htmlfile(out) as f:
...     with f.element("tagname", attrib={"attr": '"misquoted"'}):
...         f.write("foo")
...
>>> out.getvalue()
'<tagname attr=""misquoted"">foo</tagname>'

Expected output:

'<tagname attr="&quot;misquoted&quot;">foo</tagname>'

Lack of proper escaping confuses the hell out of browsers :) Proper escaping is needed to safely put html documents inside srcdoc attribute of an <iframe>.

The workaround is to quote the data before putting it inside an attribute. Fortunately, only replacing " with &quot; is enough and & is also not escaped (but it should be).

versions:
Python : sys.version_info(major=2, minor=7, micro=11, releaselevel='final', serial=0)

lxml.etree : (3, 6, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Best regards,
Burak
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
AAyzenberg | 17 Jun 03:30 2016

Error installing lxml 3.5.0 on CentOS 7


Hi All,
The installation of the lxml 3.5.0 is failing on the CentOS 7 (x86_64).
The search on the web did not help much.  My last hope is your group.

I tried with using pip with different arguments:  lxml==3.5.0

(.venv) [centos <at> MyDirectory]$ pip install -r env/requirements   where file
requirements has only one line:

(.venv) [centos <at> MyDirectory]$ pip install lxml-3.5.0.tar.gz

[centos <at> MyDirectory]$ sudo pip install lxml==3.5.0

[centos <at> MyDirectory]$ sudo pip install lxml-3.5.0.tar.gz

[centos <at> MyDirectory]$ .venv/bin/pip install lxml-3.5.0.tar.gz

[centos <at> MyDirectory]$ sudo .venv/bin/pip install lxml-3.5.0.tar.gz

All the times the first line indicating an error is:
"  Running setup.py install for lxml ... error"

Also I looked to use command "yum info package", but did not find an
appropriate .rpm file for lxml-3.5.0.

The console outputs of the last attempt with pip is below (actually, they
all are almost the same):

[centos <at> MyDirectory]$ sudo .venv/bin/pip install lxml-3.5.0.tar.gz
Processing ./lxml-3.5.0.tar.gz
Building wheels for collected packages: lxml
  Running setup.py bdist_wheel for lxml ... error
  Complete output from command /home/kokone/.virtualenv/bin/python -u -c
"import setuptools, tokenize;__file__='/tmp/pip-v7rlpA-build/setup.py';exec
(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n',
'\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmpNsW741pip-wheel-
--python-tag cp27:
  Building lxml version 3.5.0.
  Building without Cython.
  Using build configuration of libxslt 1.1.28
  Building against libxml2/libxslt in the following directory: /usr/lib64
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/builder.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/cssselect.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/pyclasslookup.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/usedoctest.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/ElementInclude.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/sax.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/_elementpath.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/__init__.py -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/doctestcompare.py -> build/lib.linux-x86_64-2.7/lxml
  creating build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/__init__.py ->
build/lib.linux-x86_64-2.7/lxml/includes
  creating build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/_html5builder.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/builder.py -> build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/html5parser.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/_diffcommand.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/clean.py -> build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/formfill.py -> build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/usedoctest.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/_setmixin.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/soupparser.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/ElementSoup.py ->
build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/__init__.py -> build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/defs.py -> build/lib.linux-x86_64-2.7/lxml/html
  copying src/lxml/html/diff.py -> build/lib.linux-x86_64-2.7/lxml/html
  creating build/lib.linux-x86_64-2.7/lxml/isoschematron
  copying src/lxml/isoschematron/__init__.py ->
build/lib.linux-x86_64-2.7/lxml/isoschematron
  copying src/lxml/lxml.etree.h -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/lxml.etree_api.h -> build/lib.linux-x86_64-2.7/lxml
  copying src/lxml/includes/relaxng.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/xmlerror.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/etreepublic.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/c14n.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/dtdvalid.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/config.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/schematron.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/xpath.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/xmlparser.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/tree.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/xslt.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/htmlparser.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/xmlschema.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/uri.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/xinclude.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/lxml-version.h ->
build/lib.linux-x86_64-2.7/lxml/includes
  copying src/lxml/includes/etree_defs.h ->
build/lib.linux-x86_64-2.7/lxml/includes
  creating build/lib.linux-x86_64-2.7/lxml/isoschematron/resources
  creating build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/rng
  copying src/lxml/isoschematron/resources/rng/iso-schematron.rng ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/rng
  creating build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl
  copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl
  copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl
  creating
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
  running build_ext
  building 'lxml.etree' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/src
  creating build/temp.linux-x86_64-2.7/src/lxml
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/libxml2
-Isrc/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o
build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
  {standard input}: Assembler messages:
  {standard input}: Error: open CFI at the end of file;
missing .cfi_endproc directive
  gcc: internal compiler error: Killed (program cc1)
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See <http://bugzilla.redhat.com/bugzilla> for instructions.
  Compile failed: command 'gcc' failed with exit status 4
  creating tmp
  cc -I/usr/include/libxml2 -I/usr/include/libxml2
-c /tmp/xmlXPathInitU3APeb.c -o tmp/xmlXPathInitU3APeb.o
  cc tmp/xmlXPathInitU3APeb.o -L/usr/lib64 -lxml2 -o a.out
  error: command 'gcc' failed with exit status 4

  ----------------------------------------
  Failed building wheel for lxml
  Running setup.py clean for lxml
Failed to build lxml
Installing collected packages: lxml
  Running setup.py install for lxml ... error
    Complete output from command /home/kokone/.virtualenv/bin/python -u -c
"import setuptools, tokenize;__file__='/tmp/pip-v7rlpA-build/setup.py';exec
(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n',
'\n'), __file__, 'exec'))" install
--record /tmp/pip-x6hFv8-record/install-record.txt
--single-version-externally-managed --compile
--install-headers /home/kokone/.virtualenv/include/site/python2.7/lxml:
    Building lxml version 3.5.0.
    Building without Cython.
    Using build configuration of libxslt 1.1.28
    Building against libxml2/libxslt in the following directory: /usr/lib64
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/builder.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/cssselect.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/pyclasslookup.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/usedoctest.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/ElementInclude.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/sax.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/_elementpath.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/__init__.py -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/doctestcompare.py -> build/lib.linux-x86_64-2.7/lxml
    creating build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/__init__.py ->
build/lib.linux-x86_64-2.7/lxml/includes
    creating build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/_html5builder.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/builder.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/html5parser.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/_diffcommand.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/clean.py -> build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/formfill.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/usedoctest.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/_setmixin.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/soupparser.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/ElementSoup.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/__init__.py ->
build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/defs.py -> build/lib.linux-x86_64-2.7/lxml/html
    copying src/lxml/html/diff.py -> build/lib.linux-x86_64-2.7/lxml/html
    creating build/lib.linux-x86_64-2.7/lxml/isoschematron
    copying src/lxml/isoschematron/__init__.py ->
build/lib.linux-x86_64-2.7/lxml/isoschematron
    copying src/lxml/lxml.etree.h -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/lxml.etree_api.h -> build/lib.linux-x86_64-2.7/lxml
    copying src/lxml/includes/relaxng.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/xmlerror.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/etreepublic.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/c14n.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/dtdvalid.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/config.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/schematron.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/xpath.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/xmlparser.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/tree.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/xslt.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/htmlparser.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/xmlschema.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/uri.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/xinclude.pxd ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/lxml-version.h ->
build/lib.linux-x86_64-2.7/lxml/includes
    copying src/lxml/includes/etree_defs.h ->
build/lib.linux-x86_64-2.7/lxml/includes
    creating build/lib.linux-x86_64-2.7/lxml/isoschematron/resources
    creating build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/rng
    copying src/lxml/isoschematron/resources/rng/iso-schematron.rng ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/rng
    creating build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl
    copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl
    copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl
    creating
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl
 ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying
src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt ->
build/lib.linux-x86_64-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    running build_ext
    building 'lxml.etree' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/src
    creating build/temp.linux-x86_64-2.7/src/lxml
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/libxml2
-Isrc/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o
build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
    {standard input}: Assembler messages:
    {standard input}:225029: Error: unknown pseudo-op: `.lvl'
    {standard input}: Error: open CFI at the end of file;
missing .cfi_endproc directive
    gcc: internal compiler error: Killed (program cc1)
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <http://bugzilla.redhat.com/bugzilla> for instructions.
    Compile failed: command 'gcc' failed with exit status 4
    cc -I/usr/include/libxml2 -I/usr/include/libxml2
-c /tmp/xmlXPathInitDAarZm.c -o tmp/xmlXPathInitDAarZm.o
    cc tmp/xmlXPathInitDAarZm.o -L/usr/lib64 -lxml2 -o a.out
    error: command 'gcc' failed with exit status 4

    ----------------------------------------
Command "/home/kokone/.virtualenv/bin/python -u -c "import setuptools,
tokenize;__file__='/tmp/pip-v7rlpA-build/setup.py';exec(compile(getattr
(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__,
'exec'))" install --record /tmp/pip-x6hFv8-record/install-record.txt
--single-version-externally-managed --compile
--install-headers /home/kokone/.virtualenv/include/site/python2.7/lxml"
failed with error code 1 in /tmp/pip-v7rlpA-build/
[centos <at> MyDirectory]$

Additional information:  initial file requirements has a lot of packages
listed.  They all were installed successfully.

If this kind of problem is not your group area of expertise or interest
please let me know which group  I should contact.  I  will highly
appreciate any suggestion.

Sicerely,
Alex Ayzenberg

#####################################################################################
Scanned by MailMarshal - Trustwave's comprehensive email content security solution. 
Download a free evaluation of MailMarshal at www.trustwave.com
#####################################################################################
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Picon

Remove &#13; from etree.tostring() output

Hi folks, I have yet another stupid question😂. I'm getting the carriage return as an entity (&#13;) in HTMLs I generate. Is there a way to get rid of them?

I haven't found anything after a lot of duckduckgo'ing

Thanks a lot
/PA

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Dag Sverre Seljebotn | 8 Jun 10:38 2016
Picon
Picon
Gravatar

xpath suddenly slow on new laptop/install?

Hi,


I just got a new laptop, and after setting it up, my xpath queries with lxml are going very, very much slower. As in -- on my old laptop it finishes going through a lot of XML files in 30 seconds, on the new one I have to wait for hours without it completing.


Both are Core i7 with lots of memory, just with 5 years between them. The XML files are a couple of megabytes each.


On the new laptop I've both tried lxml from Ubuntu, from Anaconda, and building it myself..


Is there any obvious things (fallback to Python implementation if a package is missing or similar), or do I need to dig deeper? I necessary I'll debug further by moving installs back and forth between the machines,  or build old versions of lxml, or profile lxml... didn't do that yet.


Dag Sverre

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Rainer Hoerbe | 7 Jun 18:54 2016
Picon
Gravatar

Disable DTD parsing

What are the options in lxml to prevent the parser to process DTDs, i.e. reject any XML that contains a DTD
(for security reasons)?

Best regards
Rainer
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Jens Tröger | 4 Jun 08:27 2016
Picon

XInclude and xml:base

Hello,

My document uses xi:include to include another XML document, and by
default this adds an xml:base="common.xml" to the included node(s).
Using 

  xmllint --xinclude --nofixup-base-uris document.xml 

I can avoid adding xml:base.  How can I achieve the same with lxml?
I.e. how do I avoid the base?  It seems that setting _Element.base to
None is a poor, even unstable solution.

Thanks!
Jens 

--

-- 
Jens Tröger
http://savage.light-speed.de/
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Jens Tröger | 25 May 19:48 2016
Picon

Schema validation fails after replace() call?

Hello,

I'm not quite sure if I'm asking a question, or sharing an observation.

Is it possible that an lxml.etree instance validates before a replace()
call, but not after?  The error messages I get from the lxml validation
are almost 200 of the same:

  <string>:440:0:ERROR:SCHEMASV:SCHEMAV_CVC_IDC: Element 'deviation': No match found for
key-sequence ['WtC2eoepX'] of keyref 'deviationstyle-refer'.

Looking at the actual XML I can positively confirm that the IDs and
IDREFs exist and are valid, before and after the replace() call.

The new subtree is equivalent to the old one, but there are elements in
the whole tree that refer to elements in the replaced subtree.  I
suspect that this causes the problem.

Interestingly:
 - xmllint validates the new tree when written to a file, and
 - if I serialize the entire tree (including the new replaced subtree)
   and parse it back, it validates.

This is intended behavior, an odd side effect, or a bug?

Thanks!
Jens

--

-- 
Jens Tröger
http://savage.light-speed.de/
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stein Rune Risa | 6 May 20:26 2016
Picon

Unable to find element with xpath


I have an XML document that looks like this:


I've previously used lxml for parsing html documents and would like to use it for XML documents as well.

I am interested in finding all the "Lap" elements in the XML file and have written some simple python code:

    sourcefile = f1=open('NAME_AND_PATH_OF_XML', 'r')
    sourceXML = sourcefile.read()
    root = etree.fromstring(sourceXML)
    laps = root.xpath('//Lap')
    print len(laps)

For some reason it cannot find "Lap" in the XML. The XML seems to be valid when I open it with a browser.

Any suggestions?

Best regards
Ziggy999
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Picon

Question regarding splitting documents

Hi folks,

I'm attaching a small sample program. My intent is to split the HTML snippet into smaller html documents using the <h4> tags as the splitting points.

Any clues?

Thanks, /PA

--
Fragen sind nicht da um beantwortet zu werden,
Fragen sind da um gestellet zu werden
Gerog Kreisler
Attachment (sample1.py): text/x-python, 1766 bytes
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Heiko Nardmann | 14 Apr 16:24 2016
Picon

Q: lxml Windows wheels

I have some trouble using the offered Windows wheels; the following is
what I get from pip:

    Could not find a version that satisfies the requirement lxml (from
versions: )

So I had a look at the WHEEL file inside the ZIP
lxml-3.6.0-cp32-none-win32.whl to see which requirements are stated inside.

Is it okay that 'Tag' is set to 'cp27-none-linux_x86_64' inside that
file? Might this be the reason why no version can be found? Maybe
someone can enlighten me with respect to wheels?

I have to admit that I'm completely new to wheels and quite new to
Python packaging but I wouldn't expect metadata with "Linux" inside a
Windows wheel?

Thx in advance!

Kind regards,

  Heiko Nardmann

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Picon

soupparser

    Hello everyone ! 

Currently working on a project involving scraping and parsing Intranet resources. 'Till now I have used BeautifulSoup, but decided to shave every ms possible thus the switch to lxml. Still, handling most of the encoding issues will be a pain, so naturally I wanted to benefit from bs4's encoding capabilities. Now to do so I have to:
   
        >>> from lxml.html.souppareser import fromstring

But I am presented with an ImportError; the module BeautifulSoup is not found

Of course that super easy to fix, but my question is: Am I missing something ? What's the bigger picture here, so to say or is it just a bug  ?

On my Window$ computer at the office the module is named bs4, same for the Linux and BSD computers at home.

Obviously, I am using BeautifulSoup4 (4.4.1 to be exact).

Thank you ! 
Nikola
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane