Re: XInclude and entities
Michael Glavassevich <mrglavas <at> ca.ibm.com>
2007-09-20 03:09:28 GMT
Hi Stuart,
"Stuart Norton" <snorton <at> juniper.net> wrote on 09/19/2007 04:57:18 PM:
> I have a question about how Xerces handles text entities when they
> are defined in both the parent and child of XInclusion. Are text
> entities supposed to be expanded before or after XIncludes are
> inserted in the result infoset?
XML parsers are required [1] to do this expansion. Conceptually this
happens pre-source infoset.
> Based on a local experiment with Xerxes-J 2.9.0, it appears that
> text entities are expanded before XIncludes are inserted. But
> according to the recommendation at http://www.w3.org/TR/xinclude/:
> ?The included items will all appear in the result infoset. This includes
> unexpanded entity reference information items if they are present.?
> I read this to mean that entity reference information items (e.g.
> text entities) are not expanded until after they are included in the
> result document (but I could be wrong).
That's not what that means.
An "unexpanded entity reference" is a term defined in the XML Information
Set [2] Recommendation. These information items represent external parsed
entities (e.g. <!ENTITY foo SYSTEM "http://xerces.apache.org/bar">) that
weren't expanded by the parser. Non-validating ones [3] in particular may
do this. Xerces expands all of them by default. XInclude processing plays
no part in that. If there was an unexpanded entity reference in the source
infoset, it's still unexpanded if it's included in the result infoset.
> I would really appreciate it if someone could explain what the
> expected behavior is.
>
> In case it helps to clarify my question, I have attached three XML
> files in a zip file. parent-xinclude-entity-text.xml uses xinclude
> to include child-xinclude-entity-text, and they both define and use
> their own version of &text-entity; (?PARENT? in the parent file and
> ?CHILD? in the child file). After parsing, the result is parent-
> xinclude-entity-text-out.xml, and you see that the text entity was
> expanded to ?PARENT? in the content from the parent file, and to
> ?CHILD? in the content from the child file. My expectation was that
> it should have been expanded to ?PARENT? in both cases, because the
> parent?s entity definition is included first and it overrides the
child?s.
>
> Thank you in advance!
>
> Stuart Norton
> Document Engineering
> Juniper Networks, Inc.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe <at> xerces.apache.org
> For additional commands, e-mail: j-users-help <at> xerces.apache.org
Thanks.
[1] http://www.w3.org/TR/2006/REC-xml-20060816/#entproc
[2] http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.rse
[3] http://www.w3.org/TR/REC-xml/#include-if-valid
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas <at> ca.ibm.com
E-mail: mrglavas <at> apache.org