Christian Heimes | 2 Feb 12:49
Picon
Favicon

etree.clear_error_log() causes segfaults

This is a friendly word of warning! Don't call etree.clear_error_log()
from multiple threads.

We found this issue during stress tests of our CherryPy based
application. Every worker thread was calling etree.clear_error_log()
after the page was rendered. Apparently we hit some sort of race condition.

Backtrace:

C  [etree.so+0x21977]
C  [etree.so+0x22d48]
C  [libxslt.so.1+0xdc17]  xsltTransformError+0xf7
C  [libxslt.so.1+0x839d]
C  [libxslt.so.1+0xa5da]  xsltParseStylesheetProcess+0x80a
C  [libxslt.so.1+0x1efbc]  xsltParseStylesheetInclude+0x1ac
C  [libxslt.so.1+0xa0b4]  xsltParseStylesheetProcess+0x2e4
C  [libxslt.so.1+0xb4b9]  xsltParseStylesheetImportedDoc+0x1e9
C  [libxslt.so.1+0xb5b8]  xsltParseStylesheetDoc+0x28
C  [etree.so+0xe640c]
C  [python2.5+0x4ff83]
C  [python2.5+0x11e67]  PyObject_Call+0x27

Christian
Alexander Shigin | 2 Feb 16:26
Picon

String parameters to xslt transformation

lxml lacks ways to apply an external parameter containing both single
and double quotes. 

The patch adds `transform` method to XSLT object with `params` and
`strparams` argument. `params` works like `**kw` of `__call__` method
(i.e. you still need to surround string literals with quotes).

`strparams` are treated literally, so you do not need to use any
escaping or quotes.

Index: src/lxml/xslt.pxd
===================================================================
--- src/lxml/xslt.pxd	(revision 61518)
+++ src/lxml/xslt.pxd	(working copy)
@@ -133,6 +133,11 @@
                                       xsltTransformContext* ctxt) nogil
     cdef xmlDoc* xsltGetProfileInformation(xsltTransformContext* ctxt) nogil

+cdef extern from "libxslt/variables.h":
+    cdef int xsltQuoteUserParams(
+        xsltTransformContext* ctxt,
+        char** params)
+
 cdef extern from "libxslt/extra.h":
     cdef char* XSLT_LIBXSLT_NAMESPACE
     cdef char* XSLT_XALAN_NAMESPACE
Index: src/lxml/xslt.pxi
===================================================================
(Continue reading)

Ronny Pfannschmidt | 4 Feb 17:50
Picon
Picon

altering the indent of the pretty output

Hi,

i'm currently porting gazpacho (a wysiwyg gtk ui file editor) to lxml,
unfortunately the pretty printer prints with an indent of 2 and in order
to match the convention i need an indent of 4

is there any simple way to archive pretty dumping to a file with an
indent of 4?

Regards Ronny
Dirk Rothe | 4 Feb 18:59
Picon
Favicon
Gravatar

Re: altering the indent of the pretty output

On Wed, 04 Feb 2009 17:50:40 +0100, Ronny Pfannschmidt  
<Ronny.Pfannschmidt <at> gmx.de> wrote:

> Hi,
>
> i'm currently porting gazpacho (a wysiwyg gtk ui file editor) to lxml,
> unfortunately the pretty printer prints with an indent of 2 and in order
> to match the convention i need an indent of 4
>
> is there any simple way to archive pretty dumping to a file with an
> indent of 4?

You could adapt the following XSL Transformation:

prettyXSL = """<?xml version="1.0" encoding="UTF-8"?>
                 <xsl:stylesheet version="1.0"  
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                     <xsl:output encoding="UTF-8" method="xml"/>
                     <xsl:param name="indent-increment" select="'   '"/>
                     <xsl:template match="* |  
comment()|processing-instruction()">
                         <xsl:param name="indent" select="'&#xA;'"/>
                         <xsl:value-of select="$indent"/>
                         <xsl:copy>
                             <xsl:copy-of select="@*"/>
                             <xsl:apply-templates>
                                 <xsl:with-param name="indent"  
select="concat($indent, $indent-increment)"/>
                             </xsl:apply-templates>
                             <xsl:if test="*">
(Continue reading)

Picon

how knowing the types return by .xpath

Example: 

from lxml import etree

f = open(options.file).read()

hparser = etree.HTMLParser(encoding='utf-8', remove_comments=True)
etree_document = etree.HTML(f, parser=hparser)

elems=etree_document.xpath(strxpath)

for frags in elems:
	print type (frags)
(...) 

if strxpath is equal //h1/following-sibling::text()
prints
<type 'lxml.etree._ElementUnicodeResult'>

if strxpath is equal //div[@class="news"]
prints 
<type 'lxml.etree._Element'>

How I do a "if" to detected the types ? 

thanks 
--

-- 
Sérgio M. B.
Attachment (smime.p7s): application/x-pkcs7-signature, 2192 bytes
(Continue reading)

Stefan Behnel | 4 Feb 20:47
Picon
Favicon
Gravatar

Re: etree.clear_error_log() causes segfaults

Hi,

Christian Heimes wrote:
> This is a friendly word of warning! Don't call etree.clear_error_log()
> from multiple threads.
> 
> We found this issue during stress tests of our CherryPy based
> application. Every worker thread was calling etree.clear_error_log()
> after the page was rendered. Apparently we hit some sort of race condition.

Thanks for the heads-up. The global error log is actually not thread-local,
so there's not much sense in calling the function above in threaded code -
or even using the global log at all.

The log that comes with API objects like XPath, XSLT and validators should
work as expected, though.

Stefan
Stefan Behnel | 4 Feb 20:53
Picon
Favicon
Gravatar

Re: altering the indent of the pretty output


Ronny Pfannschmidt wrote:
> i'm currently porting gazpacho (a wysiwyg gtk ui file editor) to lxml,
> unfortunately the pretty printer prints with an indent of 2 and in order
> to match the convention i need an indent of 4
> 
> is there any simple way to archive pretty dumping to a file with an
> indent of 4?

Apart from the already proposed XSLT serialisation, libxml2 does have a way
to set the indentation level. However, that is done globally at a
per-thread level and isn't currently exposed at lxml's API level.

I'd accept patches that support it for a single call to ET.write() and
tostring().

Stefan
Stefan Behnel | 4 Feb 20:56
Picon
Favicon
Gravatar

Re: how knowing the types return by .xpath

Hi,

Sergio Monteiro Basto wrote:
> Example: 
> 
> from lxml import etree
> 
> f = open(options.file).read()
> 
> hparser = etree.HTMLParser(encoding='utf-8', remove_comments=True)
> etree_document = etree.HTML(f, parser=hparser)
> 
> elems=etree_document.xpath(strxpath)
> 
> for frags in elems:
> 	print type (frags)
> (...) 
> 
> if strxpath is equal //h1/following-sibling::text()
> prints
> <type 'lxml.etree._ElementUnicodeResult'>
> 
> if strxpath is equal //div[@class="news"]
> prints 
> <type 'lxml.etree._Element'>
> 
> How I do a "if" to detected the types ? 

It's actually rare that the expected type isn't known in advance, but for
this kind of use case, you can just test the type as usual, i.e. use
(Continue reading)

Dirk Rothe | 4 Feb 21:15
Picon
Favicon
Gravatar

Re: how knowing the types return by .xpath

On Wed, 04 Feb 2009 20:56:40 +0100, Stefan Behnel <stefan_ml <at> behnel.de>  
wrote:

> Hi,
>
> Sergio Monteiro Basto wrote:
>> Example:
>>
>> from lxml import etree
>>
>> f = open(options.file).read()
>>
>> hparser = etree.HTMLParser(encoding='utf-8', remove_comments=True)
>> etree_document = etree.HTML(f, parser=hparser)
>>
>> elems=etree_document.xpath(strxpath)
>>
>> for frags in elems:
>> 	print type (frags)
>> (...)
>>
>> if strxpath is equal //h1/following-sibling::text()
>> prints
>> <type 'lxml.etree._ElementUnicodeResult'>
>>
>> if strxpath is equal //div[@class="news"]
>> prints
>> <type 'lxml.etree._Element'>
>>
>> How I do a "if" to detected the types ?
(Continue reading)

Stefan Behnel | 4 Feb 21:20
Picon
Favicon
Gravatar

Re: how knowing the types return by .xpath


Dirk Rothe wrote:
> On Wed, 04 Feb 2009 20:56:40 +0100, Stefan Behnel <stefan_ml <at> behnel.de>
> wrote:
>> It's actually rare that the expected type isn't known in advance, but for
>> this kind of use case, you can just test the type as usual, i.e. use
>> isinstance() to check for basestring, float or list.
> 
> ..or scalar bools.

True. See the docs for details.

http://codespeak.net/lxml/xpathxslt.html#xpath-return-values

Stefan

Gmane