Stefan Behnel | 1 Aug 2008 08:47
Picon
Favicon

Re: Segfault on XSLT/XPath undefined variable error.

Hi,

John Krukoff wrote:
> Okay, can only get it to crash when first signing a document using
> libxmlsec, so I suppose I'll simply assume that the two libraries use
> the error log in incompatible ways.

could you check if this patch makes it work better for you? It basically
restricts XSLT error logging to the lifetime of an XSL transformation.

Stefan

_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Dr R. Sanderson | 1 Aug 2008 17:16
Picon
Favicon

Python 3.0 Support


Back in May, Stefan wrote:
>  [but yes, there will be lxml for Python 3, and pretty soon]

Any news on the Py3k front?

(I'm in the process of scoping out just how hard it's going to be to 
update our code to 3000, starting with the dependencies)

Many Thanks!

Rob
Stefan Behnel | 3 Aug 2008 08:41
Picon
Favicon

Re: Python 3.0 Support

Hi,

Dr R. Sanderson wrote:
> Back in May, Stefan wrote:
>>  [but yes, there will be lxml for Python 3, and pretty soon]
> 
> Any news on the Py3k front?

It's there in general, so you can compile lxml under Py3 and run your code
against it for pure testing purposes.

However, due to changes in Py3.0 beta2, you can get crashes in the exception
handling code that Cython generates. There seem to be slight changes in the
way exceptions interact with the frame cleanup in Py3 now. And Cython does not
use frames at all but emulates them, apparently not well enough for the latest
Py3 beta...

I'm working on fixing this, but I don't know when this will be done. It may
take a couple of weeks, and will require a new source release of 2.1.x.

Stefan
Dr R. Sanderson | 3 Aug 2008 17:18
Picon
Favicon

Re: Python 3.0 Support


>>>  [but yes, there will be lxml for Python 3, and pretty soon]
>> Any news on the Py3k front?

> It's there in general, so you can compile lxml under Py3 and run your code
> against it for pure testing purposes.

Fantastic :)

And the thinko that was causing my problem is that fromstring() is all 
lowercase not fromString(). Duh.  Haven't run into any of the crashes 
yet.

> However, due to changes in Py3.0 beta2, you can get crashes in the exception
> [...]
> I'm working on fixing this, but I don't know when this will be done. It may
> take a couple of weeks, and will require a new source release of 2.1.x.

No problem!

Many thanks for the prompt reply,

Rob
John J Lee | 4 Aug 2008 14:10
Picon
Favicon

Passing UTF-8 bytestrings to lxml

Hi

Apologies in advance if this is the wrong list -- I'm suggesting a change 
to lxml, so I guess this is the right place...

I'm working on some existing code that makes use of both unicode objects 
and UTF-8 encoded bytestring objects (both of which sometimes contain 
non-ASCII characters).  I'm making changes to the code to ensure that it 
supports the unicode character set.  Unfortunately, it's not practical to 
change all of the code to use unicode objects (partly because there's a 
lot of code, and partly because fixing that would probably entail fixing 
PyGTK to return unicode objects instead of UTF-8 encoded bytestrings). 
So, the plan is to live with both unicode and UTF-8 encoded bytestrings, 
and to ensure Python's default encoding is always set to UTF-8.  I'm sure 
the wisdom that approach could be debated (!), but I hope that somebody 
will be kind enough to answer the following question anyway :-)

Looking at the code, it seems that changing function _utf8 in 
apihelpers.pxi to accept UTF-8 encoded bytestrings (see patch below) would 
be sufficient to make lxml accept UTF-8 encoded bytestrings. Indeed, that 
seems to work.

1. Will what I'm doing subtly break lxml in some way if I make use of this 
patched lxml in my own code?

2. Should lxml be changed in this way?  If it's considered important to 
avoid accidentally passing non-ASCII bytestrings to lxml, would it be 
acceptable to add a global switch to enable accepting UTF-8 encoded 
bytestrings?

(Continue reading)

Stefan Behnel | 4 Aug 2008 16:07
Picon
Favicon

Re: Passing UTF-8 bytestrings to lxml

Hi,

John J Lee wrote:
> Apologies in advance if this is the wrong list -- I'm suggesting a change
> to lxml, so I guess this is the right place...

We only have one mailing list, so this is definitely the right place.

> Looking at the code, it seems that changing function _utf8 in
> apihelpers.pxi to accept UTF-8 encoded bytestrings (see patch below) would
> be sufficient to make lxml accept UTF-8 encoded bytestrings. Indeed, that
> seems to work.

The internal encoding used by libxml2 is UTF-8, so I don't expect any
problems when you pass in UTF-8 directly - as long as you can make sure
that it's really a valid UTF-8 byte sequence.

> 2. Should lxml be changed in this way?  If it's considered important to
> avoid accidentally passing non-ASCII bytestrings to lxml

I consider that important, yes. The support for ASCII byte strings is a
pure convenience as ASCII names are extremely common in XML *and* they are
compatible with unicode strings in Python 2.x. Allowing anything other
than ASCII here would open the door for all sorts of hard to track down
encoding problems, as you would no longer get an exception when you
accidentally pass ISO encoded non-ASCII strings, for example.

Note that when lxml runs under Python 3, it will not allow you to pass
byte strings into the API at all (except for parsing, obviously).

(Continue reading)

John J Lee | 4 Aug 2008 20:02
Picon
Favicon

Re: Passing UTF-8 bytestrings to lxml

On Mon, 4 Aug 2008 16:07:33 +0200 (CEST), "Stefan Behnel"
<stefan_ml <at> behnel.de> said:
[...]
> > Looking at the code, it seems that changing function _utf8 in
> > apihelpers.pxi to accept UTF-8 encoded bytestrings (see patch below) would
> > be sufficient to make lxml accept UTF-8 encoded bytestrings. Indeed, that
> > seems to work.
> 
> The internal encoding used by libxml2 is UTF-8, so I don't expect any
> problems when you pass in UTF-8 directly - as long as you can make sure
> that it's really a valid UTF-8 byte sequence.

Thanks for this, it's very helpful.  I have a follow-up question,
though.

On discovering the fact that unicode strings containing non-ASCII
characters 
don't hash to the same value as their UTF-8 equivalent bytestring
(despite 
the fact that, for example, they compare equal, when the default
encoding is 
set to UTF-8), I'm having second thoughts about my mixed-str-and-unicode 
scheme:

>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("utf-8")
>>> hash(u"\xa3")
-610773982
(Continue reading)

Stefan Behnel | 4 Aug 2008 20:10
Picon
Favicon

Re: Passing UTF-8 bytestrings to lxml

Hi,

John J Lee wrote:
> So, my question: were I also to change the function funicode (also in 
> apihelpers.pxi) to return UTF-8 bytestrings, would lxml always return 
> UTF-8 bytestring objects from all of its API calls?

funicode() is a very central function that is called whenever a UTF-8 byte
sequence is to be converted to a Python string. I won't give you a guarantee
that everything will work if you change it, but at least I don't see a major
problem at first sight.

Stefan
Niels Bjerre | 5 Aug 2008 16:50
Favicon

Re: Transform parameter variables

Stefan Behnel <stefan_ml <at> behnel.de> writes:

> 
> Hi,
> 
> it's good practice to
> 
> a) reply to the list,
> b) avoid top-posting and
> c) read what people post.
> 
> Niels Bjerre wrote:
> > Thank You for your response
> >
> > But I have problem with the_dict:
> > These 2 statements don't give the same result.
> >
> >    1. newdoc = transform(places.myplaces, area="'3751'")
> >    2. newdoc = transform(places.myplaces, {'area':'\"\'3751\'\"'})
> >
> > 1. is passed to xslt: <xsl:param name="area" select="'ost'"/>
> > 2. is ignored
> >
> > I have tried with
> > {'area':'\'3751\''} and others
> 
> The last line will work, but as I wrote before:
> 
> > 2008/7/28 Stefan Behnel <stefan_ml <at> behnel.de>
> >> You can pass more than one kayword parameter, as everywhere in Python.
(Continue reading)

jholg | 5 Aug 2008 17:24
Picon
Picon

Re: Transform parameter variables



> Note the two stars before "the_dict". This is standard Python syntax for
> expanding a mapping into keyword arguments.
>
> Stefan
>

I'm Sorry - still no luck passing a dictionary as extentions parameter
The stylesheet has a parameter:
<xsl:param name="area" select="'ost'"/>

The parameter is picked up in the transformation if I use:
transform(doc, area="'3751'")
but not when I use the_dict
transform(doc, {'area':'\"\'3751\'\"'}) or any other variant of a dictionary or
a dict_variable I can think of!

 

 

 

Try

 

transform(doc, **{'area':"3751"})

 

Note the two stars, read up on python syntax on function calling and

keyword parameters. 




--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev

Gmane