Martin Aspeli | 2 Nov 03:58 2009
Picon

Critical crashes on Windows under high load

Hi folks,

We have an incredibly frustrating, show-stopping problem using lxml (under
Deliverance, in front of a repoze.zope2 pipeline serving up a Plone site) on
Windows.

Under high load, the Python process crashes. There is no traceback in the log,
so I can't identify where it actually happens, but we get a Windows error
dialogue saying python.exe (or pythonservice.exe if running as a Windows
service) has crashed in etree.pyd (at some binary address, no line numbers or
function references).

The Deliverance (0.3/trunk) rules use fairly complex xpath expressions. We're
trying to simplify these, but there's nothing obviously wrong, and in any case
it shouldn't crash.

We've tried to run both multi-threaded and single-threaded 'paster' processes:
the problem happens with both. I did read somewhere that it's possible to build
a single-threaded lxml egg (?), but I haven't found one.

We would be incredibly grateful for any help with (a) debugging and (b)
resolving this. At present, we're having to fight a lot of nervousness regarding
the production-worthiness of our Deliverance/lxml based solution, which is
rather unfortunate. :-(

Cheers,
Martin
David Shieh | 2 Nov 08:04 2009
Picon

About encoding question !

Hey guys,

I recently use lxml to do my HTML parsing, it's really great, and indeed the fastest one compare to other libraries.

But since I begin to parse some other pages using gb2312 coding, I've a problem. The output is in here: http://david-paste.cn/paste/50/

Please help me with this, thanks you guys.

Regards,
David


--
----------------------------------------------
Attitude determines everything !
----------------------------------------------


_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 2 Nov 14:50 2009
Picon

Re: Critical crashes on Windows under high load


Martin Aspeli, 02.11.2009 03:58:
> We have an incredibly frustrating, show-stopping problem using lxml

I assume you are using lxml 2.2.2?

> [...] on Windows.

And now we have two problems...

> Under high load, the Python process crashes. There is no traceback in the log,
> so I can't identify where it actually happens, but we get a Windows error
> dialogue saying python.exe (or pythonservice.exe if running as a Windows
> service) has crashed in etree.pyd (at some binary address, no line numbers or
> function references).

I do not build the Windows binaries myself, so I have no idea if there are
any debug symbols in there. Would certainly be nice to have them.

> The Deliverance (0.3/trunk) rules use fairly complex xpath expressions. We're
> trying to simplify these, but there's nothing obviously wrong, and in any case
> it shouldn't crash.

XPath shouldn't crash by itself, so I'd rather focus the debugging on the
other things you are doing. Are you running the XPath queries against trees
that are being modified concurrently? Did you check for memory problems?

Could you try to come up with a stripped down set of operations that your
code does using lxml? And which of them happen concurrently?

> We've tried to run both multi-threaded and single-threaded 'paster' processes:
> the problem happens with both.

Does that mean that this happens even if you run everything single-threaded?

> We would be incredibly grateful for any help with (a) debugging and (b)
> resolving this. At present, we're having to fight a lot of nervousness regarding
> the production-worthiness of our Deliverance/lxml based solution, which is
> rather unfortunate. :-(

Certainly.

Stefan
Piet van Oostrum | 2 Nov 14:37 2009
Picon

Re: About encoding question !

>>>>> David Shieh <mykingheaven <at> gmail.com> (DS) wrote:

>DS> Hey guys,
>DS> I recently use lxml to do my HTML parsing, it's really great, and
>DS> indeed the fastest one compare to other libraries.

>DS> But since I begin to parse some other pages using gb2312 coding, I've a
>DS> problem. The output is in here: http://david-paste.cn/paste/50/

Firstly, HTML is not XML. XHTML is, however. So if your input is not
XHTML, you should use a HTML parser instead of the XML parser.

>From the first error message it seems that you have a byte string as
input, not a Unicode string (this also seems to be implied by your
message ('pages using gb2312 coding'). If you feed these to the xml
parser they should contain an encoding declaration, like:

<?xml version="1.0" encoding="gb2312"?>

Otherwise the parser thinks it is utf-8, as the error message
indicates.

contents.encode('utf-8') doesn't make sense when contents contains a
byte string. This would only make sense when it contains a Unicode
string. Neither does contents.encode('gb2312').

contents.decode('utf-8') is wrong if contents does not contain a utf-8
encoded byte string. However, contents.decode('gb2312') would make
sense if contents contains a gb2312 encoded byte string. This will
deliver a Unicode string that you can pass to etree.fromstring. So
etree.fromstring(contents.decode('gb2312')) could be an alternative
for specifying gb2312 in the file itself.

--

-- 
Piet van Oostrum <piet <at> vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]
Nicholas Dudfield | 2 Nov 14:53 2009
Picon

Re: lxml 2.2.3 released

Stefan & Co,

Great news on 2.2.3.

Any ETA for windows 2.5 binaries?

Cheers.
Martin Aspeli | 2 Nov 15:24 2009
Picon

Re: Critical crashes on Windows under high load

Stefan Behnel wrote:
> Martin Aspeli, 02.11.2009 03:58:
>> We have an incredibly frustrating, show-stopping problem using lxml
> 
> I assume you are using lxml 2.2.2?

Yes, though we also tried the latest in the 2.0.x line as a downgrade 
for a bit. Same problem.

>> [...] on Windows.
> 
> And now we have two problems...
> 
> 
>> Under high load, the Python process crashes. There is no traceback in the log,
>> so I can't identify where it actually happens, but we get a Windows error
>> dialogue saying python.exe (or pythonservice.exe if running as a Windows
>> service) has crashed in etree.pyd (at some binary address, no line numbers or
>> function references).
> 
> I do not build the Windows binaries myself, so I have no idea if there are
> any debug symbols in there. Would certainly be nice to have them.

Who does? Sidnei?

>> The Deliverance (0.3/trunk) rules use fairly complex xpath expressions. We're
>> trying to simplify these, but there's nothing obviously wrong, and in any case
>> it shouldn't crash.
> 
> XPath shouldn't crash by itself, so I'd rather focus the debugging on the
> other things you are doing. Are you running the XPath queries against trees
> that are being modified concurrently?

It's possible that Deliverance is doing something evil here, but I kind 
of doubt it. As far as I can tell, this is a Windows-specific problem, 
or at least no-one seems to have reported it on Unix.

> Did you check for memory problems?

How would I do that?

> Could you try to come up with a stripped down set of operations that your
> code does using lxml? And which of them happen concurrently?

I'm not sure. It'd be difficult. The crash dialogue doesn't tell me 
where in lxml the problem is (since there's no stack trace). Deliverance 
is doing a fair amount of work with lxml (evaluating xpath expressions, 
parsing the two input trees (theme + content), modifying the output 
tree). So far, we've not been able to pinpoint exactly where it happens, 
or if it's even deterministic.

>> We've tried to run both multi-threaded and single-threaded 'paster' processes:
>> the problem happens with both.
> 
> Does that mean that this happens even if you run everything single-threaded?

We put the paster processes under which the WSGI pipeline runs into 
single threaded mode (or at least, we set the threadpool size of each 
process to 1), so in theory, there shouldn't be any concurrency. I don't 
know if that's actually the case, though.

I guess the most constructive thing would be if I could find some better 
way of debugging this. People closer to the project (and server) where 
this is happening are working on a load test suite that can reproduce 
this reliably, though it's pretty much trial and error. The problem is 
that as of right now, I don't know what I'd do next even if they did 
make it occur reliably.

I don't understand how lxml is built, how Cython works, how to write C 
extensions, or how to do C development on Windows. It's a loooong time 
since I wrote C/C++ and that was on Linux. ;-)

Martin
Martin Aspeli | 2 Nov 15:27 2009
Picon

Re: 2.2.2 binary egg for Mac OS X 10.6

Martin Aspeli wrote:
> Martin Aspeli <optilude <at> gmail.com> writes:
>  
>>> Is there any chance we could have 10.6 eggs? If there are reliable build 
>>> instructions, I can help build them.
> 
> Okay, I finally got this to work using zc.buildout and z3c.recipe.staticlxml.
> 
> I have eggs for Python 2.4 and 2.6. The build for Python 2.5 is failing in
> mysterious ways (it says "no egg found" in the temp directory).
> 
> Can I have PyPI access (username 'optilude') to upload these? Otherwise, can I
> send them somewhere for someone else?

Actually, I'm not sure that these *do* work. I think I need to defer to 
Stefan Eletzhofer or someone else with a bit more experience of doing 
this right.

It is really frustrating. People get stuck on this almost on a daily 
basis trying to use some of the new Plone tools we have that depend on 
lxml. :(

I realise it's not lxml's fault, but unfortunately it's something that 
lxml will have to fix, since Apple aren't going to. ;-)

Martin

--

-- 
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book
Stefan Behnel | 2 Nov 16:01 2009
Picon

Re: Critical crashes on Windows under high load


Martin Aspeli, 02.11.2009 15:24:
> Stefan Behnel wrote:
>> Martin Aspeli, 02.11.2009 03:58:
>>> Under high load, the Python process crashes. There is no traceback in the log,
>>> so I can't identify where it actually happens, but we get a Windows error
>>> dialogue saying python.exe (or pythonservice.exe if running as a Windows
>>> service) has crashed in etree.pyd (at some binary address, no line numbers or
>>> function references).
>> I do not build the Windows binaries myself, so I have no idea if there are
>> any debug symbols in there. Would certainly be nice to have them.
> 
> Who does? Sidnei?

Yes.

>>> The Deliverance (0.3/trunk) rules use fairly complex xpath expressions. We're
>>> trying to simplify these, but there's nothing obviously wrong, and in any case
>>> it shouldn't crash.
>> XPath shouldn't crash by itself, so I'd rather focus the debugging on the
>> other things you are doing. Are you running the XPath queries against trees
>> that are being modified concurrently?
> 
> It's possible that Deliverance is doing something evil here, but I kind 
> of doubt it. As far as I can tell, this is a Windows-specific problem, 
> or at least no-one seems to have reported it on Unix.

So I assume you ran similar load tests under Unix systems?

>> Did you check for memory problems?
> 
> How would I do that?

I mean, does the process' memory usage grow uncontrolled? If it's running
out of memory, it's quite possible that it crashes. Not all memory errors
can be handled safely.

>> Could you try to come up with a stripped down set of operations that your
>> code does using lxml? And which of them happen concurrently?
> 
> I'm not sure. It'd be difficult.

Who said debugging would come for free?

> The crash dialogue doesn't tell me 
> where in lxml the problem is (since there's no stack trace). Deliverance 
> is doing a fair amount of work with lxml (evaluating xpath expressions, 
> parsing the two input trees (theme + content), modifying the output 
> tree).

Is that one tree per thread or are trees being handled by multiple threads?
If threads don't share data, it can't be a threading issue (at least not
from the POV of lxml).

>>> We've tried to run both multi-threaded and single-threaded 'paster' processes:
>>> the problem happens with both.
>> Does that mean that this happens even if you run everything single-threaded?
> 
> We put the paster processes under which the WSGI pipeline runs into 
> single threaded mode (or at least, we set the threadpool size of each 
> process to 1), so in theory, there shouldn't be any concurrency. I don't 
> know if that's actually the case, though.

It would be helpful if you could find out. In the worst case, you can
inject a WSGI layer that simply acquires a lock while it forwards the
request. Then you're sure it's single threaded.

> I guess the most constructive thing would be if I could find some better 
> way of debugging this. People closer to the project (and server) where 
> this is happening are working on a load test suite that can reproduce 
> this reliably, though it's pretty much trial and error. The problem is 
> that as of right now, I don't know what I'd do next even if they did 
> make it occur reliably.

Well, at least, if it can be reproduced, it can be tracked down and fixed.

> I don't understand how lxml is built, how Cython works, how to write C 
> extensions, or how to do C development on Windows. It's a loooong time 
> since I wrote C/C++ and that was on Linux. ;-)

Luckily, you don't have to. lxml is written in Cython, not in C.

Stefan
Martin Aspeli | 2 Nov 16:11 2009
Picon

Re: Critical crashes on Windows under high load

Stefan Behnel wrote:

>>>> The Deliverance (0.3/trunk) rules use fairly complex xpath expressions. We're
>>>> trying to simplify these, but there's nothing obviously wrong, and in any case
>>>> it shouldn't crash.
>>> XPath shouldn't crash by itself, so I'd rather focus the debugging on the
>>> other things you are doing. Are you running the XPath queries against trees
>>> that are being modified concurrently?
>> It's possible that Deliverance is doing something evil here, but I kind 
>> of doubt it. As far as I can tell, this is a Windows-specific problem, 
>> or at least no-one seems to have reported it on Unix.
> 
> So I assume you ran similar load tests under Unix systems?

No, I wish we could. :(

I'm basing this on the fact that (a) Unix deployments seem more common 
(b) no-one has reported this on Unix that I can see and (c) I've found 
at least one other person with Windows crashes.

But who knows, I could be completely wrong. What I can say for certain 
is that the crashes do occur from time to time under relatively normal 
usage patterns.

>>> Did you check for memory problems?
>> How would I do that?
> 
> I mean, does the process' memory usage grow uncontrolled? If it's running
> out of memory, it's quite possible that it crashes. Not all memory errors
> can be handled safely.

We normally discover the error only after the process has crashed. 
There's no pre-warning.

It looks like memory usage is relatively stable when the system is 
running normally. I'll try to take a closer look, though.

>>> Could you try to come up with a stripped down set of operations that your
>>> code does using lxml? And which of them happen concurrently?
>> I'm not sure. It'd be difficult.
> 
> Who said debugging would come for free?

Heh, true. A *lot* of time has gone into this already. We're talking 
about a fairly big stack here, though. What I think we try, though is to 
attempt to reproduce the problem with a load test suite and a static 
back end instead of having Plone in the mix. That should produce a 
relatively small WSGI pipeline and a manageable amount of code. If it 
still crashes, of course.

>> The crash dialogue doesn't tell me 
>> where in lxml the problem is (since there's no stack trace). Deliverance 
>> is doing a fair amount of work with lxml (evaluating xpath expressions, 
>> parsing the two input trees (theme + content), modifying the output 
>> tree).
> 
> Is that one tree per thread or are trees being handled by multiple threads?
> If threads don't share data, it can't be a threading issue (at least not
> from the POV of lxml).

One per thread almost certainly. They're read on each request as far as 
I can tell.

I'd have to defer to the Deliverance developers, though.

>>>> We've tried to run both multi-threaded and single-threaded 'paster' processes:
>>>> the problem happens with both.
>>> Does that mean that this happens even if you run everything single-threaded?
>> We put the paster processes under which the WSGI pipeline runs into 
>> single threaded mode (or at least, we set the threadpool size of each 
>> process to 1), so in theory, there shouldn't be any concurrency. I don't 
>> know if that's actually the case, though.
> 
> It would be helpful if you could find out. In the worst case, you can
> inject a WSGI layer that simply acquires a lock while it forwards the
> request. Then you're sure it's single threaded.

Does anyone know? We're using Paste#httpserver and set threadpool_count 
= 1. I assume that means single threaded?

>> I guess the most constructive thing would be if I could find some better 
>> way of debugging this. People closer to the project (and server) where 
>> this is happening are working on a load test suite that can reproduce 
>> this reliably, though it's pretty much trial and error. The problem is 
>> that as of right now, I don't know what I'd do next even if they did 
>> make it occur reliably.
> 
> Well, at least, if it can be reproduced, it can be tracked down and fixed.

Yeah. That's basically what we're working towards now. But it's not 
straightforward, at least not in a way that we can give to other people 
to look at.

>> I don't understand how lxml is built, how Cython works, how to write C 
>> extensions, or how to do C development on Windows. It's a loooong time 
>> since I wrote C/C++ and that was on Linux. ;-)
> 
> Luckily, you don't have to. lxml is written in Cython, not in C.

But libxml2 and libxslt are. I suppose it's conceivable the problem is 
there, or in the way they're statically linked perhaps? Not that I 
understand Cython either. ;-)

Thanks for your help!

Martin

--

-- 
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book
Emanuele D'Arrigo | 2 Nov 16:29 2009
Picon

Re: Handling processing instructions

Stefan, I don't know if you missed this thread:

is it possible to remove a processing instruction that is a preceding sibling of the root node of an ElementTree? Somehow I can access it via tree.getroot().getprevious() or tree.getroot().itersiblings(preceding=True).next() but I cannot find a way to delete it.

A test case to cut&paste:

from lxml import etree
from StringIO import StringIO
tree = etree.parse(StringIO("<?aProcInstrToBeDeleted?><aRoot />"))

Thank you!

Manu

_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev

Gmane