Stefan Behnel | 1 Mar 2008 09:33
Picon
Favicon

Re: Setting URL from lxml.html.fromstring, etc

Hi Ian,

Ian Bicking wrote:
> OK.  Then would the html base attribute just be a read-only property
> then?  Like:
> 
>   def base(self):
>       return super(HtmlElement, self).base
>   base = property(base)
>
> I'm not terribly concerned about whether it is read-only or not.  It's a
> little fuzzy, since HTML is parsed to the lxml representation, and
> though it will probably be serialized to HTML again (if it is serialized
> at all) and HTML doesn't have anything like xml:base, the lxml
> representation is not itself exactly HTML.  And if you serialize to
> XHTML, then xml:base is available.

Hmm, true. However, if you use lxml.html, you're likely to stay in the HTML
world, so I would prefer making this read-only. If you really want an xml:base
attribute, you can set it yourself, and if you really want to set the document
URL, it's better to be explicit than setting it through an Element.

> Also translating HTML to XHTML is kind of an outstanding issue for
> lxml.html, and it seems reasonable to me that XHTML could be parsed into
> the same classes as HTML.  The only real caveat there is that XHTML uses
> different (namespaced) tag names.  If you remove the tag names, then the
> classes and the lookup applies just fine.  (Presumably the lookup could
> be changed to support XHTML fairly easily.)

That's a different topic, so I think we should discuss that in a separate thread.
(Continue reading)

Stefan Behnel | 1 Mar 2008 09:49
Picon
Favicon

XHTML handling in lxml.html

Ian Bicking wrote:
> translating HTML to XHTML is kind of an outstanding issue for lxml.html,
> and it seems reasonable to me that XHTML could be parsed into the same
> classes as HTML.  The only real caveat there is that XHTML uses different
> (namespaced) tag names.

I agree that there is more we could do. For example, we could add "xhtml" as a
serialisation method and do stuff internally to add a namespace declaration to
the serialised "<html>" (iff there isn't a namespace declared already). I'm
not sure if it would be an error if the tree contains non-HTML elements, I
guess we could just leave that to the user.

> If you remove the tag names, then the classes and
> the lookup applies just fine.  (Presumably the lookup could be changed to
> support XHTML fairly easily.)

I would say so, yes. There would also be issues with the XPath expressions in
things like html.clean, I assume. It would definitely be a good thing if the
whole machinery could handle namespace-free HTML and namespaced XHTML equally
well.

Stefan
Sidnei da Silva | 1 Mar 2008 12:26
Favicon

Re: Using EXSLT extensions on Windows with standard lxml binaries

On Fri, Feb 29, 2008 at 5:32 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>  > What should we do? Release new builds of 1.3.x with updated libxslt?
>
>  This is not a critical problem, so I wouldn't do a re-release. If you can
>  build 2.0.2 with a newer libxslt, that's just fine. I currently don't have the
>  time to backport fixes for a 1.3.7 release, but once that gets done, we'll
>  have that problem sorted out as well.

Ok, 2.0.2 is up.

>  Is there a way you could document the libxml2/libxslt versions used when
>  uploading binaries? Like, in the file comment on PyPI?

Right now, only if I do it manually, or if I override the 'upload'
setuptools command. There's no command-line or setup.py option to
specifying what the comment will be, it is hardcoded inside the
'upload' command.

--

-- 
Sidnei da Silva
Enfold Systems                http://enfoldsystems.com
Fax +1 832 201 8856     Office +1 713 942 2377 Ext 214
Stefan Behnel | 1 Mar 2008 17:06
Picon
Favicon

Re: Segfault and bus error when importing lxml.html.clean after importing webbrowser

Hi,

Jon Rosebaugh wrote:
> I was trying to use lxml.html.clean to sanitize comments in my blog.
> Unfortunately, although I can import and use it in a standalone
> console session, it fails within the webapp. Sometimes it segfaults,
> and sometimes it's a bus error instead.
> After going through all the imports to see what _they_ imported, I
> finally tracked down a minimal example that can cause the problem:
> 
> import webbrowser
> import lxml.html.clean
> 
> If I reverse the order of imports, everything works fine, so for the
> moment I've worked around it by making sure that lxml.html.clean is
> imported the very first thing.

The problem has been investigated. Apparently, importing the webbrowser module
can dynamically load the libxml2 library. As only lxml was built against the
updated libraries, this first import will load the older system libraries,
which then conflict with the libraries that lxml requires.

https://bugs.launchpad.net/lxml/+bug/197243

This problem is due to a misconfigured system that uses conflicting library
versions, so there is nothing lxml can do here.

Stefan
Stefan Behnel | 2 Mar 2008 10:08
Picon
Favicon

XSLT extension elements landed on trunk

Hi,

the current trunk now has support for Python implemented XSLT extension
elements. It's sort of a sandbox environment with read-only Elements, where
you can do basically anything based on the stylesheet and the input document,
and then append some result subtree to the XSLT output tree.

Here's a short XSLT snippet that uses an extension, and a Python class that
provides such an extension:

  <xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:myns="testns"
        extension-element-prefixes="myns">
    <xsl:template match="a">
      <A>
        <myns:myext><x>X</x><y>Y</y><z/></myns:myext>
      </A>
    </xsl:template>
  </xsl:stylesheet>

  class MyExt(etree.XSLTExtension):
      def execute(self, context, self_node, input_node, output_parent):
          # apply templates to my own children and process the result
          for child in self_node:
              for result in self.apply_templates(context, child):
                  if isinstance(result, basestring):
                      el = etree.Element("T")
                      el.text = result
                  else:
(Continue reading)

Stefan Behnel | 3 Mar 2008 08:22
Picon
Favicon

Re: Using EXSLT extensions on Windows with standard lxml binaries

Hi Sidnei,

Sidnei da Silva wrote:
> On Fri, Feb 29, 2008 at 5:32 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>>  Is there a way you could document the libxml2/libxslt versions used when
>>  uploading binaries? Like, in the file comment on PyPI?
> 
> Right now, only if I do it manually, or if I override the 'upload'
> setuptools command.

Overriding 'upload' isn't practicable as there isn't a hook for it. The
comment is built right before uploading the file.

You could add a line to the package description on the PyPI site manually,
like "the Windows binary downloads on this site statically include libxml2
2.6.XY and libxslt 1.1.Z". Not sure you're currently allowed to do so, though.

Stefan
René 'Necoro' Neumann | 4 Mar 2008 00:58
Picon

[BUG] lxml-2* hangs on interpreter shutdown with gtk-mainloop


Hi,

I'm developing a PyGTK-application that uses lxml to validate plugin-XMLs.
After upgrading to lxml-2*, I noticed, that my application is not shut
down correctly (i.e. I close the application, but it still runs in the
background).

After evaluating a little bit, I got the test case attached. This case
hangs at the end: Example output:

necoro <at> Devoty ~ % ./test.py
lxml.etree:        (2, 0, 2, 0)
libxml used:       (2, 6, 31)
libxml compiled:   (2, 6, 31)
libxslt used:      (1, 1, 22)
libxslt compiled:  (1, 1, 22)
Destroy...
Destroyed
Now after the GTK-Main ... So the main has finished

Important notes:

it does not hang if I use etree.XML instead of etree.parse
it does not hang if gobject.threads_init() is not called
works with lxml-1.3.6

A quick glance with gdb:
(gdb) bt
#0  0xb7f97410 in __kernel_vsyscall ()
(Continue reading)

Elena Soutyrina | 4 Mar 2008 03:07

error installing lxml

I am having trouble to install lxml.

I already installed libxml2 (2.6.30) and libxslt, Cython (0.9.6.12)

easy_install lxml gives me error  Building with Cython 0.9.6.12.

Building lxml version 2.0.2.

warning: no previously-included files found matching 'doc/pyrex.txt'

src/lxml/lxml.etree.c:1536: error: syntax error before ‘xmlSchemaSAXPlugStruct’

src/lxml/lxml.etree.c:1536: error: syntax error before ‘xmlSchemaSAXPlugStruct’

 

What am I missing?

 

Best regards,

Elena Soutyrina

Application Engineer | Scope Seven
 
310 220 3939 x430
2201 Park Place, Suite 100 | El Segundo, CA 90245
www.scopeseven.com
 
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 4 Mar 2008 08:16
Picon
Favicon

Re: error installing lxml

Hi,

Elena Soutyrina wrote:
> I am having trouble to install lxml.
> 
> I already installed libxml2 (2.6.30) and libxslt, Cython (0.9.6.12)

Not installing Cython is generally a good idea, although it shouldn't change
anything here.

> easy_install lxml gives me error  Building with Cython 0.9.6.12.
> 
> Building lxml version 2.0.2.
> 
> warning: no previously-included files found matching 'doc/pyrex.txt'
> 
> src/lxml/lxml.etree.c:1536: error: syntax error before
> 'xmlSchemaSAXPlugStruct'
> 
> src/lxml/lxml.etree.c:1536: error: syntax error before
> 'xmlSchemaSAXPlugStruct'
> 
>  
> 
> What am I missing?

At least, I'm missing the output that comes *before* the gcc errors above.
Could you send that in as well?

Stefan
René 'Necoro' Neumann | 4 Mar 2008 10:22
Picon

Re: [BUG] lxml-2* hangs on interpreter shutdown with gtk-mainloop


Just some more information if it is needed:

Python: 2.5.1
PyGTK: 2.12.0
PyGObject: 2.14.0

Gmane