Re: parsing DTDs - listing of valid elements
Stefan Behnel <stefan_ml <at> behnel.de>
2009-07-01 06:14:57 GMT
Hi,
Elliott Slaughter wrote:
> I'm trying to get the elements in a DTD. Since these internals are not
> exported in the Python interface of lxml.etree, I am trying to write a
> Cython extension to do so, as previously suggested on this mailing list (see
> link below).
>
> http://codespeak.net/pipermail/lxml-dev/2009-January/004298.html
>
> To quote the message, "all you'd really need is the internal _c_dtd field of
> the DTD class, which you could cimport". I'm wondering exactly how I am
> supposed to do that
> [...]
> Here is what I've tried so far (on Python 2.5.4, Cython 0.11.2, Windows):
>
> The DTD class is not declared in etreepublic.pxd, so I can't just "cimport
> etreepublic". The actual DTD class definition is in dtd.pxi, as stated in
> the message. But I can't just "include 'dtd.pxi' " because it inherits from
> the _Validator class in lxml.etree.pyx . And I can't "cimport lxml.etree"
> because there is no file lxml.etree.pxd.
True. So your only chance is to write one yourself. And yes, it needs to be
called "lxml.etree.pxd".
> I tried writing a lxml.etree.pxd file to circumvent these barriers (which
> was thoroughly confusing because _Validator contains an _ErrorLog which made
> me search through several other files...),
All you should really need is this:
cimport tree
cdef class _Validator:
cdef object _error_log
cdef class DTD(_Validator):
cdef tree.xmlDtd* _c_dtd
Cython needs to know the exact /layout/ of the classes that you use (at
least if they are not exported as C header files), but it doesn't need to
know the exact class types of attributes. "object" will do just fine if you
don't care.
I know that this is harder than necessary (thanks for bringing this up,
BTW), but that's just because _DTD isn't an 'officially' C-exported type,
just like all other schema types.
Stefan