Tim Browski | 29 Jul 23:44 2015
Picon

issue with findtext and find

Hi,

For the following scenario find and find text cannot return the element. Is it intended functionality or a possible bug?


<a> <b> </b> <b> <c></c> </b> </a>

when element called with elm.find("b/c") or elm.findtext("b/c"), it returns None.

However, when I call find with exact XPATH location it returns the expected result.

Best,
Tim
    


_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Tim Browski | 29 Jul 23:43 2015
Picon

Fwd: issue with findtext and find


Hi,

For the following scenario find and find text cannot return the element. Is it intended functionality or a possible bug?


<a> <b> </b> <b> <c></c> </b> </a>

when element called with elm.find("b/c") or elm.findtext("b/c"), it returns None.

However, when I call find with exact XPATH location it returns the expected result.

Best,
Tim
    

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Gilles Lenfant | 20 Jul 14:37 2015
Picon

Weird (?) tree.getpath behaviour

Hi,

I use the "tree.getpath(element)" to solve a problem, and I need to have a meaningful path.

In the example at http://pastebin.com/1Xjprfui the getpath method behaves exactly as expected when I use it in an XML doc with no namespace or if all elements have a namespace prefix (examples 1 and 3).

But if I use tree.getpath(element) on a document with a default namespace (second example), I got weird getpath results like "/*", "/*/*/*[2]" and so on when I expected meaningful path like "/root", "/root/parent/child[2]".

Is this a bug or a feature ? If a feature, is there some workaround to get meaningful path expressions ?

Thanks in advance for any help.
--
Gilles Lenfant

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Rice, Nathan Alexander | 13 Jul 18:29 2015
Picon

debugging issues with application of XSLT

Hello,

I'm currently attempting to apply an XSL transform to an XML document using LXML.  Unfortunately, when I do
this I get the following traceback:

    /usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree.XSLT.__call__ (src/lxml/lxml.etree.c:160146)()

    XSLTApplyError: Failed to evaluate the 'select' expression.

Given there are quite a few select expressions in the XSL document (and more than one may be an issue), is
there any way to get LXML to spit out the specific expression that triggered this error?

Thank you,

Nathan
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Bjorn Holmgren | 13 Jul 14:07 2015
Picon

Windows 7, x64

Hi,


I need to install lxml, but it fails. I install it with: pip install lxml. I have Python 3.4 installed on a Windows 7, x64 computer. I have Visual Studio 2010 installed as required by Python 3.4.


I get this error message:

 C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\cl.exe /c /nologo
 /Ox /MD /W3 /GS- /DNDEBUG -Ic:\users\ashhab\appdata\local\temp\pip-build-0r0v7a
\lxml\src\lxml\includes -IC:\Python33\include -IC:\Python33\include /Tcsrc\lxml\
lxml.etree.c /Fobuild\temp.win32-3.3\Release\src\lxml\lxml.etree.obj -w
    cl : Command line warning D9025 : overriding '/W3' with '/w'
    lxml.etree.c
    c:\users\ashhab\appdata\local\temp\pip-build-0r0v7a\lxml\src\lxml\includes\e
tree_defs.h(14) : fatal error C1083: Cannot open include file: 'libxml/xmlversio
n.h': No such file or directory
    C:\Python33\lib\distutils\dist.py:258: UserWarning: Unknown distribution opt
ion: 'bugtrack_url'
      warnings.warn(msg)
    error: command '"C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\
cl.exe"' failed with exit status 2


Can someone help me to install lxml in Windows 7?


Regards,
Björn






_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
John Munroe | 26 Jun 08:14 2015
Picon

Error from .itertext() “ValueError: Input object has no element: HtmlComment”

Hi

I'm trying to iterate through the text content of a subtree using elt.itertext() (v3.5.0b1 git master
branch) as follows:

import lxml.html.soupparser as soupparser
import requests

doc = requests.get("http://f10.5post.com/forums/showthread.php?t=1142017").content
tree = soupparser.fromstring(doc)

nodes = tree.getchildren()

for elt in nodes:
    for t in elt.itertext():
         print t

But I keep getting an error saying

 File "src/lxml/iterparse.pxi", line 248, in lxml.etree.iterwalk.__init__ (src/lxml/lxml.etree.c:134032)
 File "src/lxml/apihelpers.pxi", line 67, in lxml.etree._rootNodeOrRaise (src/lxml/lxml.etree.c:15220)
ValueError: Input object has no element: HtmlComment

Is there a way to skip all HTML comments? Also, what does this error actually mean?

Any help will be appreciated.

Thanks

John

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
John Munroe | 24 Jun 10:10 2015
Picon

Build error: No such file or directory: 'src/lxml/lxml.etree.c'

Hi,

I've grabbed 3.5.0beta1 from github and tried building it. I'm on OS X and have lxml2.9.2 rather than
lxml2.9.1. So, I’m using the following command to build:

python setup.py build --static-deps --libxml2-version=2.9.2 --without-cython

but I keep getting an error saying

clang: error: no such file or directory: 'src/lxml/lxml.etree.c'
clang: error: no input files

Indeed, the C file doesn't exist and isn't part of the distribution though.

Am I missing something? I'd like to have it installed in a virtualenv (eventually).

Any help will be appreciated.

Thanks

John

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Stefan Behnel | 21 Jun 13:41 2015
Picon

Re: Can I change maxvars?

Sam Bull schrieb am 21.06.2015 um 12:12:
> On Mon, 2014-07-14 at 20:52 +0200, Stefan Behnel wrote:
>> Sam Bull, 12.07.2014 17:15:
>>> I'm trying to process some XML files, and a few of them are several
>>> thousand lines long, and with the moderately complicated XSL I'm using,
>>> I seem to be hitting recursion limits.
>>>
>>> I'm currently getting this message:
>>>         lxml.etree.XSLTApplyError: xsltApplyXSLTTemplate: A potential
>>>         infinite template recursion was detected.
>>>         You can adjust maxTemplateVars (--maxvars) in order to raise the
>>>         maximum number of variables/params (currently set to 15000).
>>>
>>> It says I can adjust the value, but doesn't explain how, nor is this
>>> value mentioned anywhere in the documentation.
>>>
>>> I've just had to change the maxdepth, which can be done with
>>> XSLT.set_global_max_depth(), but there doesn't appear to be an
>>> equivalent for maxvars. How can I change this value?
>>
>> You can't currently. The problem is, it was new in libxslt 1.1.27, and even
>> the next lxml release will still support everything back to 1.1.23, so this
>> needs a little C level hacking to support depending on the libxslt version
>> it compiles against.
>>
>> The upside is that libxslt 1.1.27 also introduced a per-context setting
>> (maxTemplateVars), i.e. you can define the value for each stylesheet run
>> rather than setting a global value. A new keyword argument for XSLT()
>> should work nicely here, e.g. "max_recursion_vars". The same applies to
>> "maxTemplateDepth" in 1.1.27, which could be set as "max_recursion_depth"
>> in XSLT().
> 
> Don't suppose there's been any progress on this?

No. Pull requests still welcome.

Stefan

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Dionyz Lazar | 16 Jun 12:24 2015
Picon

Iterparse memory problem

Hello, 

I have been using lxml (3.4.3) for parsing xmls from vendors. For example, here is one of the smaller files that should be publicly available: http://www.eberry.cz/editor/image/eshop_products/feed_seznam_jyxo.xml

I am using urllib3 to get the response which should be file-like object that I am sending straight to iterparse method. It works great memory-wise as it does not have to put whole file into memory (some files can be huge).

I am interested only in SHOPITEM element and I also clear() the element after I am done with it. I tried tag attribute of iterparse method to get events relevant only to this element. When I do that, the memory usage spikes up and it looks like it is putting whole file in memory. 

Any ideas on what could cause this behavior? 

Regards,
Dio




_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Paul Keating | 9 Jun 12:55 2015

How to set up a Soap envelope

My web services people want me to enclose an xml message in the following envelope:

 

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" >

<soapenv:Header/>

<soapenv:Body>

… (actual message goes here) …

</soapenv:Body>

</soapenv:Envelope>

 

I don’t know how to express this in lxml calls, because  I’m a total xml novice who understands nothing about namespaces. Pointers would be welcome.

 

 

Regards

 

P

 



The information contained in this e-mail is confidential and may be privileged. It may be read, copied and used only by the intended recipient. If you have received it in error, please contact the sender immediately by return e-mail. Please delete this e-mail and do not disclose its contents to any person. NIBC Holding N.V. nor its subsidiaries accept liability for any errors, omissions, delays of receipt or viruses in the contents of this message which arise as a result of e-mail transmission. NIBC Holding N.V. (Chamber of commerce nr. 27282935), NIBC Bank N.V. (Chamber of commerce nr. 27032036) and NIBC Investment Management N.V. (Chamber of commerce nr. 27253909) all have their corporate seat in The Hague, The Netherlands.

De informatie in dit e-mailbericht is vertrouwelijk en uitsluitend bestemd voor de geadresseerde. Wanneer u dit bericht per abuis ontvangt, gelieve onmiddellijk contact op te nemen met de afzender per kerende e-mail. Wij verzoeken u dit e-mailbericht te vernietigen en de inhoud ervan aan niemand openbaar te maken. NIBC Holding N.V. noch haar dochterondernemingen aanvaarden enige aansprakelijkheid voor onjuiste, onvolledige dan wel ontijdige overbrenging van de inhoud van een verzonden e-mailbericht, noch voor door haar daarbij overgebrachte virussen. NIBC Holding N.V. (KvK nr. 27282935), NIBC Bank N.V. (KvK nr. 27032036) en NIBC Investment Management N.V. (KvK nr. 27253909) zijn statutair gevestigd te Den Haag, Nederland.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Frederik Elwert | 9 Jun 12:06 2015
Picon

xmlfile and namespaces/pretty printing

Hello,

I want to write a very large XML file to disc. Since I ran into memory
issues using the regular ElementTree.write() method, I switched to using
etree.xmlfile. Generally, it works quite well, but I ran into two
issues. Here’s my test code:

----8<----

from lxml import etree

P_DATA = '{http://www.dspin.de/data}'
P_TEXT = '{http://www.dspin.de/data/textcorpus}'

with etree.xmlfile('test.xml', encoding='utf-8') as xf:
    with xf.element(P_DATA + 'D-Spin',
                    nsmap={None: 'http://www.dspin.de/data'}):
        with xf.element(P_TEXT + 'TextCorpus',
                lang='de',
                nsmap={None: 'http://www.dspin.de/data/textcorpus'}):
            element = etree.Element(P_TEXT + 'tokens',
                    nsmap={None: 'http://www.dspin.de/data/textcorpus'})
            element2 = etree.SubElement(element, P_TEXT + 'token')
            xf.write(element, pretty_print=True)

---->8----

And here’s the output:

----8<----
<D-Spin xmlns="http://www.dspin.de/data"><TextCorpus
xmlns="http://www.dspin.de/data/textcorpus" lang="de"><tokens
xmlns="http://www.dspin.de/data/textcorpus">
  <token/>
</tokens>
</TextCorpus></D-Spin>
---->8----

Now my questions are:

1. I had to add an nsmap argument to the creation of "element" in order
to prevent an "ns0:" prefix in the output. But this lead to a
duplication of the declaration of the default namespace
'http://www.dspin.de/data/textcorpus' on both <TextCorpus> and <tokens>.

Since the generation of the Elements that I write to the xmlfile happens
somewhere else in the real code, it is a bit cumbersome to add nsmaps
all over the place. And even then, I have the duplicated namespace
declaration. So ideally I’d like xf.write() to be aware of the current
namespace map defined by the xf.element. Is that possible?

2. I can pass "pretty_print=True" to xf.write(), but it naturally only
affects those sub-trees. Is it possible to pretty-print the elements
generated by xf.element() as well? Maybe it would be nice to be able to
pass pretty_print to etree.xmlfile() itself?

Regards,
Frederik

--

-- 
Dr. Frederik Elwert

Project Manager/SeNeReKo
Postdoctoral Researcher/KHK
Centre for Religious Studies
Ruhr-University Bochum

Universitätsstr. 150
D-44780 Bochum
	
Phone +49(0)234 32-23024
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane