Dmitri Fedoruk | 1 Apr 16:13
Picon

etree.parse hangs with a lot of parallel requests

Hi all,

I'm using lxml-2.0_1 now  (I have not upgraded since to most recent
versions as I have not noticed any features relevant to me),
libxml2-2.6.30 , libxslt-1.1.22, FreeBSD 6.2 and 7.0 , the application
runs within mod_python / apache 2.2.8 .

My situation is pretty straightforward: fetch xml as plain text via
http, parse it and get etree object, than apply xslt and get resulting
html.

The code is the following:
self.xmlParser = etree.XMLParser(no_network = False, resolve_entities
= False, load_dtd = True )

I use load_dtd=True as sometimes I encounter html entities in my input
data. They are included in my dtd in this way:
<!ENTITY % HTMLlat1 SYSTEM "xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol SYSTEM "xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial SYSTEM "xhtml-special.ent">
%HTMLspecial;

Then eventually it comes up to
...
xmlres = etree.parse( StringIO.StringIO( reply['data'] ), self.xmlParser )

(Continue reading)

John Krukoff | 1 Apr 22:00
Favicon

Re: ElementTree.find does not accept QName objects.


On Sat, 2008-03-29 at 11:42 +0100, Stefan Behnel wrote:
> Hi,
> 
> John Krukoff wrote:
> > Since I was the one that complained about the find method on Elements
> > not accepting QNames, it's probably not surprising that I expected them
> > to work with the ElementTree find method as well. Instead an unsliceable
> > error is thrown, due to the value being expected to be a string
> 
> Sure, here's the obvious patch.
> 
> BTW, I expect ET to have the same problem here.
> 
> Stefan
> 

Thanks for your always quick response. Yeah, ET has the same issue, but
then it doesn't accept QNames for element.find either. Only one of many
reasons I gave up on ET compatibility a long time ago.

--

-- 
John Krukoff <jkrukoff <at> ltgc.com>
Land Title Guarantee Company
John Krukoff | 1 Apr 22:24
Favicon

Re: ElementTree.find does not accept QName objects.

Okay, that's weird. I knew that I'd been able to use QName's with ET in
that past, but when I double checked I found that it didn't work for me.
It looks like I just managed to hit some magic special case in ET to
make this work at all, as this works:

Python 2.5.1 (r251:54863, Jan  8 2008, 15:02:32) 
[GCC 4.1.2 (Gentoo 4.1.2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from elementtree import ElementTree
>>> x = ElementTree.XML( '<a><b/></a>' )
>>> x.find( 'b' )
<Element b at b7c90b0c>
>>> x.find( ElementTree.QName( 'b' ) )
<Element b at b7c90b0c>

But this doesn't:

Python 2.5.1 (r251:54863, Jan  8 2008, 15:02:32) 
[GCC 4.1.2 (Gentoo 4.1.2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from elementtree import ElementTree
>>> x = ElementTree.XML( '<a><b/></a>' )
>>> x.find( ElementTree.QName( 'b' ) )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/elementtree/ElementTree.py",
line 327, in find
    return ElementPath.find(self, path)
  File "/usr/lib/python2.5/site-packages/elementtree/ElementPath.py",
line 183, in find
(Continue reading)

Holger Joukl | 4 Apr 12:58
Picon
Picon

[lxml] adding __float__, __int__ etc. to objectify.StringElement

Hi,

 

I suggest adding __float__, __int__ etc. methods

to lxml.objectify StringElement, to enable things like

 

>>> float(objectify.DataElement("234"))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: float() argument must be a string or a number
>>>

 

These will just try to invoke the very same operation on the

underlying pyval. 

Maybe there are other classes where such methods would

be helpful (BoolElement comes to mind, but I'll have to look).

 

Any objections? I can add this plus some tests otherwise.

 

Holger 




--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 6 Apr 16:20
Picon
Favicon
Gravatar

Re: ElementTree.find does not accept QName objects.


John Krukoff wrote:
> It looks like it only works when ET.find has already been called with
> the string value of the same name that a following QName find specifies.
> Some internal caching, perhaps?

There is an internal cache in ElementPath.py that might cut in here.

> In any case, it does look like accepting QNames at all is a bug in ET,
> or at least an accident. Don't know what that means for lxml, but it
> would seem to me that strict compatibility would mean that find should
> be restricted to strings.

I think it's right to accept QName objects wherever tag names are accepted. So
it's ET that's wrong here.

Stefan
Stefan Behnel | 6 Apr 16:26
Picon
Favicon
Gravatar

Re: etree.parse hangs with a lot of parallel requests

Hi,

Dmitri Fedoruk wrote:
> The code is the following:
> self.xmlParser = etree.XMLParser(no_network = False, resolve_entities
> = False, load_dtd = True )
> 
> I use load_dtd=True as sometimes I encounter html entities in my input
> data. They are included in my dtd in this way:
> <!ENTITY % HTMLlat1 SYSTEM "xhtml-lat1.ent">
> %HTMLlat1;
> 
> <!ENTITY % HTMLsymbol SYSTEM "xhtml-symbol.ent">
> %HTMLsymbol;
> 
> <!ENTITY % HTMLspecial SYSTEM "xhtml-special.ent">
> %HTMLspecial;
> 
> Then eventually it comes up to
> ...
> xmlres = etree.parse( StringIO.StringIO( reply['data'] ), self.xmlParser )
> 
> And here I have serious problems.  Parsing time is usually up to 100
> ms (even this is critical time for me). But sometimes I have 3, 5 and
> even 60 seconds (!) of parsing. This situation happens under a heavy
> load (~20 simultaneous parsings/transformations per sec).
> 
> So, I have several questions:
> 1) What am I doing wrong?
> 2) Is there any way to limit the runtime of the etree.parse? Is there
> any way to kill a thread maybe? I can not afford to wait even 150 ms,
> to say nothing about 1 second and more.

It seems you only want to parse DTDs locally from disc, so setting
"no_network=True" (which is the default in lxml 2.0) should prevent any
accidental remote access.

Does that help?

Stefan
Stefan Behnel | 6 Apr 16:32
Picon
Favicon
Gravatar

Re: adding __float__, __int__ etc. to objectify.StringElement

Hi,

Holger Joukl wrote:
> I suggest adding __float__, __int__ etc. methods 
> 
> to lxml.objectify StringElement, to enable things like
> 
> >>> float(objectify.DataElement("234"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: float() argument must be a string or a number
> 
> These will just try to invoke the very same operation on the underlying pyval.

Ok with me, StringElement should behave as much like a string as possible.

> Maybe there are other classes where such methods would
> be helpful (BoolElement comes to mind, but I'll have to look).

You mean because of the int/bool duality in Python, but I don't think that's
something we should easily enable without a compelling use case. Remember that
it would mean converting the string value "true" to int(1), I don't think
that's obvious behaviour.

> Any objections? I can add this plus some tests otherwise.

Go ahead.

Stefan
jholg | 7 Apr 15:33
Picon
Picon

Re: adding __float__, __int__ etc. to objectify.StringElement

Hi,

Checked in as revision 53527:

$ svn diff -r 53526
Index: src/lxml/tests/test_objectify.py
===================================================================
--- src/lxml/tests/test_objectify.py    (revision 53526)
+++ src/lxml/tests/test_objectify.py    (working copy)
<at> <at> -815,7 +815,27 <at> <at>
         el = objectify.DataElement(s)
         val = 5
         self.assertRaises(TypeError, el.__mod__, val)
+
+    def test_type_str_as_int(self):
+        v = "1"
+        el = objectify.DataElement(v)
+        self.assertEquals(int(el), 1)
 
+    def test_type_str_as_long(self):
+        v = "1"
+        el = objectify.DataElement(v)
+        self.assertEquals(long(el), 1)
+
+    def test_type_str_as_float(self):
+        v = "1"
+        el = objectify.DataElement(v)
+        self.assertEquals(float(el), 1)
+
+    def test_type_str_as_complex(self):
+        v = "1"
+        el = objectify.DataElement(v)
+        self.assertEquals(complex(el), 1)
+
     def test_type_str_mod_data_elements(self):
         s = "%d %f %s %r"
         el = objectify.DataElement(s)
Index: src/lxml/lxml.objectify.pyx
===================================================================
--- src/lxml/lxml.objectify.pyx (revision 53526)
+++ src/lxml/lxml.objectify.pyx (working copy)
<at> <at> -773,6 +773,18 <at> <at>
     def __mod__(self, other):
         return _strValueOf(self) % other
 
+    def __int__(self):
+        return int(textOf(self._c_node))
+
+    def __long__(self):
+        return long(textOf(self._c_node))
+
+    def __float__(self):
+        return float(textOf(self._c_node))
+
+    def __complex__(self):
+        return complex(textOf(self._c_node))
+
 cdef class NoneElement(ObjectifiedDataElement):
     def __str__(self):
         return "None"


> You mean because of the int/bool duality in Python, but I don't think that's
> something we should easily enable without a compelling use case. Remember that
> it would mean converting the string value "true" to int(1), I don't think
> that's obvious behaviour.

Yes, I was referring to that:

>>> int(True)
1
>>> float(True)
1.0
>>> long(True)
1L
>>> complex
<type 'complex'>
>>> complex(True)
(1+0j)
>>>

I actually wasn't aware of that behaviour of Python booleans.
And this is definitely no priority for me. Then again, one could argue
that BoolElement should behave as much as a native bool in Python,
only that its XML representation is the string value "true".

And there are already subtleties for BoolElement:

>>> root = etree.fromstring("<root><x>true</x></root>")
>>> type(root.x)
<type 'lxml.objectify.BoolElement'>
>>> root.x.text
'true'
>>> str(root.x)
'True'
>>>


Cheers,

Holger




--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 8 Apr 09:52
Picon
Favicon
Gravatar

Re: adding __float__, __int__ etc. to objectify.StringElement

Hi,

jholg <at> gmx.de wrote:
>>>> int(True)
> 1
>>>> float(True)
> 1.0
>>>> long(True)
> 1L
>>>> complex
> <type 'complex'>
>>>> complex(True)
> (1+0j)
> 
> I actually wasn't aware of that behaviour of Python booleans.
> And this is definitely no priority for me. Then again, one could argue
> that BoolElement should behave as much as a native bool in Python,
> only that its XML representation is the string value "true".

Hmm, I buy that. As long as the conversion is explicit, I think objectify
Elements /should/ behave as their Python counter types.

I'll check if inheriting from IntElement does the right thing.

Stefan
jholg | 9 Apr 08:29
Picon
Picon

Re: adding __float__, __int__ etc. to objectify.StringElement

Hi,

> I actually wasn't aware of that behaviour of Python booleans.
> And this is definitely no priority for me. Then again, one could argue
> that BoolElement should behave as much as a native bool in Python,
> only that its XML representation is the string value "true".

Hmm, I buy that. As long as the conversion is explicit, I think objectify
Elements /should/ behave as their Python counter types.

I'll check if inheriting from IntElement does the right thing.

 

Maybe the __int__, __float__ etc. methods should even go the

the ObjectifiedDataElement class? So basically every explicit to-number

conversion for data elements would work right out of the box, if the

corresponding pyval class supports it.

For BoolElement, you'd need t override it anyway, as str(textOf(self._c_node)) 

will not work for "true".

 

Holger 




--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev

Gmane