Konstantin Ryabitsev | 3 Jan 2008 16:29
Favicon

Help with an error message

Hi, everyone:

I'm having trouble with the following case. One of my automatic import
scripts takes data from one source and submits it to another as an XML
feed. Recently, it started failing because one of the entries contains
a null. The testcase is such:

from lxml.etree import Element
sourcestr = 'Contains a null: \x00'
unistr = unicode(sourcestr, 'utf-8')
elt = Element('foo').text = unistr

Running it will cause the following error:

Traceback (most recent call last):
  File "foo.py", line 6, in <module>
    elt = Element('foo').text = unistr
  File "etree.pyx", line 741, in etree._Element.text.__set__
  File "apihelpers.pxi", line 344, in etree._setNodeText
  File "apihelpers.pxi", line 648, in etree._utf8
AssertionError: All strings must be XML compatible, either Unicode or ASCII

Can someone suggest the best way to deal with this?

Kind regards,
--

-- 
Konstantin Ryabitsev
Montréal, Québec
_______________________________________________
lxml-dev mailing list
(Continue reading)

Stefan Behnel | 3 Jan 2008 17:30
Picon
Favicon

Re: Help with an error message

Hi,

Konstantin Ryabitsev wrote:
> I'm having trouble with the following case. One of my automatic import
> scripts takes data from one source and submits it to another as an XML
> feed. Recently, it started failing because one of the entries contains
> a null. The testcase is such:
> 
> from lxml.etree import Element
> sourcestr = 'Contains a null: \x00'
> unistr = unicode(sourcestr, 'utf-8')
> elt = Element('foo').text = unistr
> 
> Running it will cause the following error:
> 
> Traceback (most recent call last):
>   File "foo.py", line 6, in <module>
>     elt = Element('foo').text = unistr
>   File "etree.pyx", line 741, in etree._Element.text.__set__
>   File "apihelpers.pxi", line 344, in etree._setNodeText
>   File "apihelpers.pxi", line 648, in etree._utf8
> AssertionError: All strings must be XML compatible, either Unicode or ASCII
> 
> Can someone suggest the best way to deal with this?

My first question is: why do you need a '\x00' here? If you want to pass
binary data in XML, the best way is to use a safe encoding such as uuencode or
whatever. That should be part of your XML language spec/schema/...

Stefan
(Continue reading)

Rob Sanderson | 3 Jan 2008 17:33
Picon
Favicon

Re: Help with an error message


The null character makes the XML non-well-formed anyway.

The legal character ranges for XML (as per the spec, section 2.2):

Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

Definitely no \x00!

So ... I would base64 encode any offending data, as suggested by Stefan.

Rob

On Thu, 2008-01-03 at 17:30 +0100, Stefan Behnel wrote:
> Konstantin Ryabitsev wrote:
> > I'm having trouble with the following case. One of my automatic import
> > scripts takes data from one source and submits it to another as an XML
> > feed. Recently, it started failing because one of the entries contains
> > a null. 

> My first question is: why do you need a '\x00' here? If you want to pass
> binary data in XML, the best way is to use a safe encoding such as uuencode or
> whatever. That should be part of your XML language spec/schema/...
Stefan Behnel | 3 Jan 2008 17:57
Picon
Favicon

Re: Help with an error message

Hi,

Rob Sanderson wrote:
> The null character makes the XML non-well-formed anyway.
> 
> The legal character ranges for XML (as per the spec, section 2.2):
> 
> Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x10000-#x10FFFF]
> 
> Definitely no \x00!

that's true. While you could get away on the XML /generator/ side with adding
an Entity (and lxml 2.0 will let you do that), this will just let you write
out broken XML that the recipient will not be able to parse:

  >>> from lxml import etree as et
  >>> el = et.Element("test")
  >>> el.text = "mind the "
  >>> el.append(et.Entity("#0"))
  >>> xml = et.tostring(el)
  '<test>mind the &#0;</test>'

  >>> et.fromstring(xml)
  Traceback (most recent call last):
  lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value 0, line 1,
column 20

Maybe we should fix the Entity() factory here to prevent such misuse...

(Continue reading)

Kenneth Miller | 4 Jan 2008 21:13
Picon

XML Schemas (XSD) and Objectification

All,

     Is there any way to use an XSD file to generate an object in  
python using Objectify?

Regards,
Ken
John Lovell | 4 Jan 2008 21:59

Re: XML Schemas (XSD) and Objectification

Ken:

While I do not know if you can do this using Objectify, you should look
at generateDS as a backup.

http://www.rexx.com/~dkuhlman/generateDS.html

Good luck,

John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell <at> esd189.org

www.esd189.org
Together We Can ...

-----Original Message-----
From: lxml-dev-bounces <at> codespeak.net
[mailto:lxml-dev-bounces <at> codespeak.net] On Behalf Of Kenneth Miller
Sent: Friday, January 04, 2008 12:13 PM
To: lxml-dev <at> codespeak.net
Subject: [lxml-dev] XML Schemas (XSD) and Objectification

All,

     Is there any way to use an XSD file to generate an object in python
(Continue reading)

Stefan Behnel | 4 Jan 2008 22:45
Picon
Favicon

Re: XML Schemas (XSD) and Objectification

Hi,

Kenneth Miller wrote:
>      Is there any way to use an XSD file to generate an object in  
> python using Objectify?

Hmm, lxml.objectify actually works "as is", based on the XML document itself
(i.e. an 'instance' of the schema), but without any schema interaction. What
are you trying to achieve? Type enforcement based on schema types?

Or do you mean 'generate an object' in the sense that you want to map an
objectify object to a plain Python object?

Stefan
Stefan Behnel | 5 Jan 2008 10:02
Picon
Favicon

Re: XML Schemas (XSD) and Objectification

Hi,

please reply also to the list.

Kenneth Miller wrote:
> On Jan 4, 2008, at 3:45 PM, Stefan Behnel wrote:
>> Kenneth Miller wrote:
>>>     Is there any way to use an XSD file to generate an object in
>>> python using Objectify?
>>
>> Hmm, lxml.objectify actually works "as is", based on the XML document
>> itself
>> (i.e. an 'instance' of the schema), but without any schema
>> interaction. What
>> are you trying to achieve? Type enforcement based on schema types?
>>
>> Or do you mean 'generate an object' in the sense that you want to map an
>> objectify object to a plain Python object?
>>
> I'd like to be able to simply create the objects defined by the schema
> as python objects.

:) repeating an answer doesn't always help in understanding it.

But I think what you mean is: you have a schema and you want to generate
source code for Python objects *in advance* to represent its document
instances. That's not how lxml.objectify works.

What lxml.objectify does, is: you give it a document instance (no schema
involved) and it will create Python objects for you *at runtime* to represent
(Continue reading)

Stefan Behnel | 5 Jan 2008 20:39
Picon
Favicon

Re: lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64

Hi Dmitri,

Stefan Behnel wrote:
> The way XSLT is implemented in lxml is a bit tricky, as libxslt makes some
> things hard to control that lxml uses in libxml2 for performance reasons. In
> particular, lxml uses a thread-local hash table for constant strings, which is
> much faster than a malloc() for each string that occurs in a document.
> However, libxslt doesn't honour this dictionary and creates its own one based
> on the stylesheet dictionary. The result is that the stylesheet can leak into
> the result document through string references that now point into the hash
> table of the stylesheet.
> 
> There isn't a way in libxslt that would allow us to prevent this or to control
> the allocation. That's why I decided to restrict the execution of XSL
> transformations to threads that inherit the same hash table as the stylesheet,
> this should normally prevent any problems.

Here is a trivial patch (the one against xslt.pxi) that, instead of raising an
exception, copies the stylesheet into the current thread context, and thus
works around the current thread restrictions. It seems to work for me, any
chance you could give it a try?

In case it doesn't work reliably, could you additionally check the second
change (in parser.pxi)? It should restrict 'acceptable' hash tables to the
local thread, not including the main thread (as it did before).

Stefan

=== src/lxml/xslt.pxi
==================================================================
(Continue reading)

Stefan Behnel | 7 Jan 2008 11:14
Picon
Favicon

cssselect and cssutils

Hi Christof,

Höke, Christof wrote:
> You are the main developer for lxml, right?

Yep, but not the only one. :)

> I was trying the CSSSelect
> facility for a Python CSS library I am developing
> (http://code.google.com/p/cssutils/)

Cool. I knew about cssutils, felt that its field of application was related to
cssselect (and lxml in general) but not with too much of an overlap - and
always thought it would be nice to have it working with lxml in some way.

> and I think there are some minor
> problems with "*" or "*|*" (I need to check again and I'll put them on the
> bug tracker then) but a question regarding support for pseudo selectors: 
> Would it be possible to support stuff like :first-letter (currently not
> working is it not?) with Python XPath extension functions which should be
> able to do what XPath cannot? Are you maybe even working on it? I guess
> things like :first-line are problematic but other should be ok.

I'm not the primary person to ask here. cssselect was developed by Ian
Bicking, he knows best what works, what doesn't, and how to fix it. :)

> If I get the time I would try some things out and report back, this was
> just an idea that I had while playing with CSSSelector...

Go ahead, this is open source. Any help, testing and ideas are always appreciated.
(Continue reading)


Gmane