Sebastian Breuers | 1 Feb 2011 22:06
Picon
Favicon

etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema

Dear lxml developers,

I encounter the following issue. As a member of the MoSGrid consortium, 
a project that is aimed to facilitate molecular simulations in the 
D-Grid environment, I want to use the CML (Chemical Markup Language) to 
describe molecular simulation jobs.

I wrote a small validator that uses the lxml.etree.XMLSchema object to 
read the XSD describing the CML3 (located at 
http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the 
schema with the lxml.etree.XMLSchemaParseError:

local complex type: The content model is not determinist., line 5962

As I wrote to the developer of the CML he told me that his schema is 
read properly in JAVA and C# with the saxon library. I've got an idea 
why the XMLSchema object is throwing that exception but now I am not 
quite sure if it is an issue with the standard (CML) or with the XMLSchema.

Your help would be most appreciated.

Kind regards,

Sebastian

--

-- 
_____________________________________________________________________________

Sebastian Breuers               Tel: +49-221-470-4108
EMail: breuerss <at> uni-koeln.de
(Continue reading)

Stefan Behnel | 2 Feb 2011 07:19
Picon
Favicon

Re: etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema

Sebastian Breuers, 01.02.2011 22:06:
> I encounter the following issue. As a member of the MoSGrid consortium, a
> project that is aimed to facilitate molecular simulations in the D-Grid
> environment, I want to use the CML (Chemical Markup Language) to describe
> molecular simulation jobs.
>
> I wrote a small validator that uses the lxml.etree.XMLSchema object to read
> the XSD describing the CML3 (located at
> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the
> schema with the lxml.etree.XMLSchemaParseError:
>
> local complex type: The content model is not determinist., line 5962
>
> As I wrote to the developer of the CML he told me that his schema is read
> properly in JAVA and C# with the saxon library. I've got an idea why the
> XMLSchema object is throwing that exception but now I am not quite sure if
> it is an issue with the standard (CML) or with the XMLSchema.

It's usually an issue with the standard of XML-Schema. ;) The problem is 
that the W3C specification is extremely complicated - it's even more 
complex than actually writing a schema, and that's telling, in case you've 
never done that. So the simple fact that there is one tool that can 
successfully parse a W3C schema document doesn't mean that every other 
validation tool can work with it. Specifically, it is a known fact that 
libxml2 (which lxml gets its schema support from) has deficiencies with 
some less widely used schema constructs.

I suggest this:

1) test the schema with the xmllint command line tool to reproduce the 
(Continue reading)

Sebastian Breuers | 2 Feb 2011 07:31
Picon
Favicon

Re: etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema

Am 02.02.2011 07:19, schrieb Stefan Behnel:
> Sebastian Breuers, 01.02.2011 22:06:
>> I encounter the following issue. As a member of the MoSGrid 
>> consortium, a
>> project that is aimed to facilitate molecular simulations in the D-Grid
>> environment, I want to use the CML (Chemical Markup Language) to 
>> describe
>> molecular simulation jobs.
>>
>> I wrote a small validator that uses the lxml.etree.XMLSchema object 
>> to read
>> the XSD describing the CML3 (located at
>> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the
>> schema with the lxml.etree.XMLSchemaParseError:
>>
>> local complex type: The content model is not determinist., line 5962
>>
>> As I wrote to the developer of the CML he told me that his schema is 
>> read
>> properly in JAVA and C# with the saxon library. I've got an idea why the
>> XMLSchema object is throwing that exception but now I am not quite 
>> sure if
>> it is an issue with the standard (CML) or with the XMLSchema.
>
> It's usually an issue with the standard of XML-Schema. ;) The problem 
> is that the W3C specification is extremely complicated - it's even 
> more complex than actually writing a schema, and that's telling, in 
> case you've never done that. So the simple fact that there is one tool 
> that can successfully parse a W3C schema document doesn't mean that 
> every other validation tool can work with it. Specifically, it is a 
(Continue reading)

jholg | 2 Feb 2011 14:58
Picon
Picon

Re: etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema

Hi,

> >> the XSD describing the CML3 (located at
> >> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the
> >> schema with the lxml.etree.XMLSchemaParseError:
> >>
> >> local complex type: The content model is not determinist., line 5962
> >>
> >> As I wrote to the developer of the CML he told me that his schema is 
> >> read
> >> properly in JAVA and C# with the saxon library. I've got an idea why
> the
> >> XMLSchema object is throwing that exception but now I am not quite 
> >> sure if
> >> it is an issue with the standard (CML) or with the XMLSchema.
> >
> > It's usually an issue with the standard of XML-Schema. ;) The problem 
> > is that the W3C specification is extremely complicated - it's even 
> > more complex than actually writing a schema, and that's telling, in 
> > case you've never done that. So the simple fact that there is one tool 
> > that can successfully parse a W3C schema document doesn't mean that 
> > every other validation tool can work with it. Specifically, it is a 
> > known fact that libxml2 (which lxml gets its schema support from) has 
> > deficiencies with some less widely used schema constructs.

Out of curiosity:

libxml2 complains about such a construct

<xsd:complexType>
(Continue reading)

Paul Tremblay | 4 Feb 2011 07:06
Picon

message still not working with xslt?

Hi developers

I believe the that when transforming xslt, lyxml still does not report messages, unless that message terminates?

<!--won't produce a message-->
<xsl:template match="root">
<xsl:message>match root</xsl:message>
</xsl:template>

<!--will produce a message-->
<xsl:template match="root">
<xsl:message terminate="yes">match root</xsl:message>
</xsl:template>


lyxml code

 xslt_doc = etree.parse(xslt_file)
    transform = etree.XSLT(xslt_doc)
    indoc = etree.parse(xml_file)
    outdoc = transform(indoc, **param_dict)
    sys.stdout.write(str(outdoc))

Thanks

Paul

_______________________________________________
lxml-dev mailing list
lxml-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
Stefan Behnel | 4 Feb 2011 07:20
Picon
Favicon

Re: message still not working with xslt?

Paul Tremblay, 04.02.2011 07:06:
> I believe the that when transforming xslt, lyxml still does not report
> messages, unless that message terminates?

I'm sure it does, you just have to know where to look. ;)

> <!--won't produce a message-->
> <xsl:template match="root">
> <xsl:message>match root</xsl:message>
> </xsl:template>
>
> <!--will produce a message-->
> <xsl:template match="root">
> <xsl:message terminate="yes">match root</xsl:message>
> </xsl:template>
>
>
> lyxml code
>
> xslt_doc = etree.parse(xslt_file)
> transform = etree.XSLT(xslt_doc)
> indoc = etree.parse(xml_file)
> outdoc = transform(indoc, **param_dict)
> sys.stdout.write(str(outdoc))

Hmm, interesting, that seems to be missing from the XSLT docs completely.

You should get at the messages through the error log of the XSLT object 
(most lxml.etree objects have one).

http://codespeak.net/lxml/parsing.html#error-log

Stefan
Paul Tremblay | 4 Feb 2011 08:12
Picon

Re: message still not working with xslt?

On 2/4/11 1:20 AM, Stefan Behnel wrote:
> Paul Tremblay, 04.02.2011 07:06:
>> I believe the that when transforming xslt, lyxml still does not report
>> messages, unless that message terminates?
>
> I'm sure it does, you just have to know where to look. ;)
>
>
>> <!--won't produce a message-->
>> <xsl:template match="root">
>> <xsl:message>match root</xsl:message>
>> </xsl:template>
>>
>> <!--will produce a message-->
>> <xsl:template match="root">
>> <xsl:message terminate="yes">match root</xsl:message>
>> </xsl:template>
>>
>>
>> lyxml code
>>
>> xslt_doc = etree.parse(xslt_file)
>> transform = etree.XSLT(xslt_doc)
>> indoc = etree.parse(xml_file)
>> outdoc = transform(indoc, **param_dict)
>> sys.stdout.write(str(outdoc))
>
> Hmm, interesting, that seems to be missing from the XSLT docs completely.
>
> You should get at the messages through the error log of the XSLT 
> object (most lxml.etree objects have one).
>
> http://codespeak.net/lxml/parsing.html#error-log
>
>
Yes, that works. Thanks.

print len(transform.error_log)
     error_obj =  transform.error_log[0]
     print error_obj.message
     print error_obj.line
     print error_obj.column

According to our link, the error_log has at least 3 methods as I've 
illustrated above. Are there any more I'm missing?

Paul
Stefan Behnel | 4 Feb 2011 09:05
Picon
Favicon

Re: message still not working with xslt?

Paul Tremblay, 04.02.2011 08:12:
> On 2/4/11 1:20 AM, Stefan Behnel wrote:
>> You should get at the messages through the error log of the XSLT
>> object (most lxml.etree objects have one).
>>
>> http://codespeak.net/lxml/parsing.html#error-log
>
> Yes, that works. Thanks.
>
> print len(transform.error_log)
>       error_obj =  transform.error_log[0]
>       print error_obj.message
>       print error_obj.line
>       print error_obj.column
>
> According to our link, the error_log has at least 3 methods as I've
> illustrated above. Are there any more I'm missing?

http://codespeak.net/lxml/dev/api/lxml.etree._LogEntry-class.html
http://codespeak.net/lxml/dev/api/lxml.etree._ErrorLog-class.html

Sorry for the lack of API documentation. Cython doesn't currently support 
attaching a docstring to an automatically generated property (this has been 
an often requested feature but no-one has taken care of it so far). But 
I'll add at least a doc comment to the _LogEntry class.

As always, documentation fixes are always welcome.

http://codespeak.net/svn/lxml/trunk/doc/

Stefan
Paul Tremblay | 5 Feb 2011 04:10
Picon

Re: message still not working with xslt?

On 2/4/11 3:05 AM, Stefan Behnel wrote:
>
> http://codespeak.net/lxml/dev/api/lxml.etree._LogEntry-class.html
> http://codespeak.net/lxml/dev/api/lxml.etree._ErrorLog-class.html
>
> Sorry for the lack of API documentation. Cython doesn't currently 
> support attaching a docstring to an automatically generated property 
> (this has been an often requested feature but no-one has taken care of 
> it so far). But I'll add at least a doc comment to the _LogEntry class.
>
> As always, documentation fixes are always welcome.
>
> http://codespeak.net/svn/lxml/trunk/doc/
>
>

Here's a documentation fix. I'm not sure about some of the properties, 
such as level, etc.

Error log
---------

Parsers have an ``error_log`` property that lists the errors of the
last parser run. Each ``error_log`` is a list, and each item in the
list is an object that has the following properties:

* ``columns``: an integer that identifies the column where the error 
occurred.
* ``domain``: a unicode string
* ``filename``: a unicode string
* ``level``: an integer
* ``level_name``:  an integer
* ``line``: a unicode string that identifies the line where the error 
occurred.
* ``message``: a unicode string that lists the message.
* ``type``:  an integer
* ``type_name``: a unicode string

.. sourcecode:: pycon

 >>> parser = etree.XMLParser()
 >>> print(len(parser.error_log))
   0

 >>> tree = etree.XML("<root></b>", parser)
   Traceback (most recent call last):
     ...
   lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: root line 
1 and b, line 1, column 11

 >>> print(len(parser.error_log))
   1

 >>> error = parser.error_log[0]
 >>> print(error.message)
   Opening and ending tag mismatch: root line 1 and b
 >>> print(error.line)
   1
 >>> print(error.column)
   11

The  following code shows how to output messages from xsl:message when 
processing XSl.

.. sourcecode:: pycon

 >>> f = StringIO('''
  ... <xsl:stylesheet
  ...     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  ...     xmlns:fo="http://www.w3.org/1999/XSL/Format"
  ...     version="1.1"
  ... >
  ... <xsl:template match="root">
  ... <xsl:message >
  ... <xsl:text>found root</xsl:text>
  ... </xsl:message>
  ... <xsl:apply-templates/>
  ... </xsl:template>
  ...
  ... <xsl:template match="para">
  ... <xsl:message >
  ... <xsl:text>found para</xsl:text>
  ... </xsl:message>
  ... </xsl:template>
  ... </xsl:stylesheet>
  ...
  ... ''')
 >>> xslt_doc = etree.parse(f)
 >>> transform = etree.XSLT(xslt_doc)
 >>> f = StringIO('<root><para>Text</para></root>')
 >>> doc = etree.parse(f)
 >>> result_tree = transform(doc)
 >>> for error in transform.error_log:
  ...     print 'message from line %s, col %s:' % (error.line, error.column)
  ...     print error.message
  ...     print
  ...     print 'domain_name: %s' % error.domain_name
  ...     print 'filename: %s' % error.filename
  ...     print 'level: %s' % error.level
  ...     print 'level_name: %s' % error.level_name
  ...     print 'type: %s' % error.type
  ...     print 'type_name: %s' % error.type_name
  ...     print '================================================='
  message from line 0, col 0:
  found root

  domain_name: XSLT
  filename: <string>
  level: 2
  level_name: ERROR
  type: 0
  type_name: ERR_OK
  =================================================
  message from line 0, col 0:
  found para

  domain_name: XSLT
  filename: <string>
  level: 2
  level_name: ERROR
  type: 0
  type_name: ERR_OK
  =================================================
Stefan Behnel | 5 Feb 2011 12:12
Picon
Favicon

Re: message still not working with xslt?

Paul Tremblay, 05.02.2011 04:10:
> Here's a documentation fix. I'm not sure about some of the properties,
> such as level, etc.

Thanks. I fixed it up somewhat and moved the second part over to the XSLT 
documentation.

http://codespeak.net/svn/lxml/trunk/doc/

Stefan

Gmane