Micah Dubinko | 1 Aug 2009 01:06

Pragmatic namespaces

Literally for years, people have been talking about how great it would  
be to use something like Java-style namespaces in XML instead of the  
current xmlns regime. For example <http://www.xml.com/pub/a/2005/04/13/namespace-uris.html 
 > .

By scoping the solution down to just HTML syntax, I believe a  
reasonable solution can be crafted, and now that the W3C is focusing  
on "distributed extensibility" as a requirement for HTML5, the timing  
seems right to see how far a proposal in this direction can go. On the  
other hand, if this proposal doesn't work out, maybe it will  
permanently end the musings about how great Java-style namespaces  
would be.

I'm posting this on xml-dev for community input and feedback. I have  
no current association with the HTML Working Group, and this is my  
personal project, with no reflection on my employer. The format is an  
intermingling of requirements and proposed solutions. This is largely  
inspired by Tom Bradford's "clean namespaces" proposal, the  
archive.org version of which I linked to previously.

Requirement: Ask not if it is good enough, ask if it can be popular  
enough.

(Thanks to Douglas Crockford for the quote). This proposal will  
horrify the purists, but that's OK.

Requirement: this solution must not interfere with existing HTML  
elements or attributes

Point 1:
(Continue reading)

COUTHURES Alain | 1 Aug 2009 12:35
Favicon
Gravatar

Re: Pragmatic namespaces

Hey Micah,

I was just having time to start to think about namespaces ;-)

Let me, first, say that I now love namespaces. Like many, my first years with XML were disturbed by namespaces : I didn't understand them and many errors occurred due to this. It was like dark magic : add this and that, add more if it still doesn't work... The no-namespace capability is even more difficult to understand because it has to be explicit ! I understand that HTML writers, who are usually not programmers, hate namespaces, considering them as useless (the element name is explicit, isn't it?) and problems cause.

I wouldn't say that today everything is perfect for me :
  • having a big list of xmlns attributes at the document element is horrible and they are usually always the same (same prefix, same URI) and when I try to reduce the number of them, I finally need one of them later...
  • upgrading a namespace sometimes means changing every declaration of it because of URI change
  • copy/paste from different documents, where different prefixes are used for the same namespace, requires delicate text editor substitutions
  • misspelling in URI occurs (URI are too long, they should not contain a year,...) and it's usually difficult to detect it immediately
  • misspelling in element or attribute names in a specific namespace occurs too...
Misspelling is, for me, the most important concern because I'm just using a smart text editor (Notepad++ is free and fast). With rendering language such as HTML or SVG, it's easy to locate what is wrong because it can be seen but, with XPath, there is no "unknown" element, just "not found" elements !

Even some widespread XML libraries and engines have difficulties with namespaces !

My first remark about Java-style namespaces is about a . or a : as separator. The dot is now allowed in element names and I have already seen some XML notations using it. I'm not sure it's a good practice anyway and . as separator sounds like Java definitely.
Requirement: Ask not if it is good enough, ask if it can be popular enough.

(Thanks to Douglas Crockford for the quote). This proposal will horrify the purists, but that's OK.
Yes, it's a very important point but I wouldn't like to reduce XML possibilities either. Easy for non-programmers, powerful for others : is it possible ?

Requirement: this solution must not interfere with existing HTML elements or attributes

Point 1:
Any element name with no dots in it is treated as HTML (including HTML rules on handing unrecognized elements)
This might be a problem for XForms instance data. XSLTForms doesn't generate an error when there is no prefix and no xmlns="" in the instance data and, when people send me their forms for support, I usually see that they don't bother with that. Yes, non-programmers try to write programs sometimes...

Requirement: this solution must allow for distributed creation of globally-unique namespace names (including those outside of a consensus process)

Point 2:
Any element with one or more dots in it is treated as an extension element. The portion after the last dot is considered the localname, and the the portion up to but not including the last dot is parsed as the pragmatic namespace name (or pname for short). Interfaces with existing namespace-aware APIs must treat the pname as the namespace URI. With the exception noted below, to prevent clashes pnames must be based on reversed DNS names.

Example:
<head>
  <title>Document title</title>
  <com.example.project>
    <com.example.id>123521123</com.example.id>
  </com.example.project>
</head>

In this example document.getElementsByTagName("id") would return the innermost element.
So would document.getElementsByTagNameNS("com.example", "id")
OK. It sounds good !

Requirement: it is highly desirable to produce a document that will produce the same element names in HTML or XML

Point 3:
Zero or more special attributes of the form using.<pname> may appear on the root element, and ONLY on the root element. The declarations have document-wide scope. The pname that appears after "using." is the one being declared. The value of the attribute is a space-separated list of localnames that represent boundary elements, in other words, upon reaching a boundary element, a new namespace gets applied to that element and all children (until encountering another boundary element).

Example equivalent to the previous:
<html using.com.example="project">
<head>
  <project>
    <id>123521123</id>
  </project>
</head>

This structure will produce the same element names in an XML parser, and a straightforward transformation could convert it to true XML+Namespaces.
ONLY on the root element is a problem for generic XML tools. The Component Manager I wrote for XSLTForms development environment can work for any XML document with its own namespaces, unknown by the Component Manager itself. With XSLT, xsl:copy-of can be used for a node from an external document, the stylesheet doesn't have to know each and every namespaces.

It's also a problem for XForms instance data.

A very good aspect of namespaces is to allow to mix data without disturbing the programs interested in just one namespace !

Why not just say that, usually, using.<pname> attributes are on the root element ?

Requirement: widely-known namespaces must be parse to an equivalent DOM as xmlns

Point 4:
In any extension element with only one dot, the token before the first dot is treated specially. Specifically, there exists a list of grandfathered namespaces, and associated namespace URIs. Interfaces with existing namespace-aware APIs must treat the grandfathered namespace URI as the namespace URI of the extension element.

Here is the list: (additional suggestions welcome)

atom http://www.w3.org/2005/Atom
docbook http://docbook.org/ns/docbook
html http://www.w3.org/1999/xhtml
math http://www.w3.org/1998/Math/MathML/
svg http://www.w3.org/2000/svg
xbl http://www.mozilla.org/xbl
xbl2 http://www.w3.org/ns/xbl
xforms http://www.w3.org/2002/xforms
xlink http://www.w3.org/1999/xlink
xml http://www.w3.org/XML/1998/namespace

Example:

<html using.math="math">...
<p>
E.g. <math><msqrt><mi>π</mi></msqrt></math>
</p>
...</html>

In this example document.getElementsByTagName("mi") would return the innermost element.
So would document.getElementsByTagNameNS("http://www.w3.org/1998/Math/MathML/", "mi")
It's pragmatic.

This kind of list should not change frequently. It sounds more than reserved prefixes.

I, personally, would like to add

xsl http://www.w3.org/1999/XSL/Transform
xsd http://www.w3.org/2001/XMLSchema
xsi http://www.w3.org/2001/XMLSchema-instance
xbrl http://www.xbrl.org/2003/instance
xbrll http://www.xbrl.org/2003/linkbase
ifrs http://xbrl.iasb.org/taxonomy/2008-03-01/ifrs
...

Big companies, such as Microsoft, would probably have their own items without asking others if they agree...

Managing a short list will not be easy...

One solution could be to limit it to namespaces frequently used in HTML documents only.


Requirement: must support HTML nested inside an extension vocabulary.

Point 5:
Unless overridden, HTML documents are treated as if all localnames explicitly listed in the specification are HTML boundary elements.

Example:
<html using.svg="svg">
  <body>
    <svg version="1.1"  viewBox="0 0 100 100" preserveAspectRatio="xMidYMid slice">
      <rect x="10" y="10" width="100" height="150" fill="gray"/>
      <foreignObject x="10" y="10" width="100" height="150">
        <body>
          <div>Here is a <strong>paragraph</strong>.</div>
        </body>
      </foreignObject>
    </svg>
  </body>
</html>

Here the inner body element and its children are still treated as HTML.

Another example:
<html using.xforms="model select1 range secret">
  <head>
    <model>...</model>
  </head>
  </body>
    <xforms.input>...
  </body>
</html>

In this case, "input" is already used as an HTML element name, so uses of it--even with the using statement at the top--need to be explicitly spelled out. Of course, the author could have overridden this by including "input" in the using statement, but then any regular HTML input controls would need to be spelled <html.input>. Just like in Java.
Yes !

That's the entire proposal.
Great !

In practice, it may be inevitable that browser makers might bake in additional defaults, like
using.math="math mi mo ms mn mtext"
such that users can freely use chosen vocabularies with zero additional markup. Support for this outcome is an additional feature of this proposal.
But compatibility problems would occur between different browsers and different versions of the same browser. It's not a non-programmer concern, isn't it ?

Thank you for this proposal. Yes, something has to be done and it sounds much more easy this way.

Best regards,

-Alain
fmeschini@tin.it | 1 Aug 2009 13:45
Picon

XML Parsers and Processors

Dear List,

For research issues I was looking for some papers, articles and tutorials about XML parsers and processors. They should not be on their use, but rather on the implementation issues, such as algorithms, data structures, context-free grammars and so on. Have already googled but did not find anything very relevant. Michael Kay already kindly sent me the link to his IBM developerworks on Saxon and it was very useful. Does anyone know something similiar for XML parsers?

Thanks in advance
Federico Meschini

Amelia A Lewis | 1 Aug 2009 16:43

Re: Pragmatic namespaces

A few comments.  :-)

On Fri, 31 Jul 2009 16:06:46 -0700, Micah Dubinko wrote:
> Requirement: this solution must not interfere with existing HTML 
> elements or attributes
> 
> Point 1:
> Any element name with no dots in it is treated as HTML (including 
> HTML rules on handing unrecognized elements)

In fact, in the xforms example, below, using "input", it is suggested 
that the corresponding HTML element must then become "html.input".

> Requirement: this solution must allow for distributed creation of 
> globally-unique namespace names (including those outside of a 
> consensus process)
> 
> Point 2:
> Any element with one or more dots in it is treated as an extension 
> element. The portion after the last dot is considered the localname, 
> and the the portion up to but not including the last dot is parsed as 
> the pragmatic namespace name (or pname for short). Interfaces with 
> existing namespace-aware APIs must treat the pname as the namespace 
> URI. With the exception noted below, to prevent clashes pnames must 
> be based on reversed DNS names.

Potentially problematic for any dialect that already uses dotted-on 
names.  However, the chance of ambiguity is minimal.  If there's 
com.example.project, com.example.id, and com.example.project.id (the 
latter being a reference to an id child of a project, tagname 
project.id), it remains unambiguous.  I see little chance of 
reverse-DNS dot-ons creating a clash between separately administered 
namespaces, which is the crucial point.

Another option: double-dot: com.example..project, com.example..id, 
com.example..project.id.

> Requirement: it is highly desirable to produce a document that will 
> produce the same element names in HTML or XML
> 
> Point 3:
> Zero or more special attributes of the form using.<pname> may appear 
> on the root element, and ONLY on the root element. The declarations 
> have document-wide scope. The pname that appears after "using." is 
> the one being declared. The value of the attribute is a 
> space-separated list of localnames that represent boundary elements, 
> in other words, upon reaching a boundary element, a new namespace 
> gets applied to that element and all children (until encountering 
> another boundary element).

Okay, I simply don't think the "only" root requirement is feasible.

I produce XHTML documents; doing so means that I can process them in 
ways that are simply impossible for HTML.  Assuming that this addition 
of pragmatic namespaces is intended (in part) to permit a similarly 
robust processor (that is, more than just renderers intended for use in 
a browser) infrastructure to develop, then it ought to be possible to 
merge documents (server-side includes, for instance; various 
programming languages such as PHP and similar that may produce 
"fragments" which are then concatenated into a single document (that 
is, the fragments are designed for re-use in multiple different 
documents)).

I'd recommend simply removing the "root only" restriction.  "using" 
acts within the scope of the element it is an attribute of.

> Requirement: widely-known namespaces must be parse to an equivalent 
> DOM as xmlns
> 
> Point 4:
> In any extension element with only one dot, the token before the 
> first dot is treated specially. Specifically, there exists a list of 
> grandfathered namespaces, and associated namespace URIs. Interfaces 
> with existing namespace-aware APIs must treat the grandfathered 
> namespace URI as the namespace URI of the extension element.
> 
> Here is the list: (additional suggestions welcome)
> 
> atom http://www.w3.org/2005/Atom
> docbook http://docbook.org/ns/docbook
> html http://www.w3.org/1999/xhtml
> math http://www.w3.org/1998/Math/MathML/
> svg http://www.w3.org/2000/svg
> xbl http://www.mozilla.org/xbl
> xbl2 http://www.w3.org/ns/xbl
> xforms http://www.w3.org/2002/xforms
> xlink http://www.w3.org/1999/xlink
> xml http://www.w3.org/XML/1998/namespace

I just don't like this.

For one thing, using "xml." as a prefix is illegal in XML.  Avoid.

This is, in effect, a prefix registry.  It has all the freedom from 
politics, agility, flexibility, and consensus support of any other 
registry (did my sarcasm tags show up?).  I'd recommend simply dropping 
this.

However, that brings up an issue, to my mind.  How would one indicate 
the MathML namespace using only a reversed domain name?

We have a number of existing namespaces.  Let's transform those to 
reverse-dns dotted-on forms.  This is slightly more verbose, but it 
would work:

org.w3.www.1998.Math.MathML

Assuming that the org.w3 owner owns the domain, then this could, in 
principle, resolve to the canonical URL (on an assumption of "http" as 
the URL scheme; add a CNAME, add a vhost for MathML.Math.1998.www that 
redirects to www/1998/Math/MathML).

This in turn seems to suggest that double-dotting to separate element 
names is more significant (because otherwise ambiguity is potentially 
introduced, as if one were to regard 1999 as the namespace for both 
xhtml and xlink, rather than 1999.xhtml 1999.xlink).

Having a "standard" transform for HTTP URLs (which are to a significant 
degree the most common, in terms of scheme, for XML namespaces) would 
be useful, to the point that I'd be close to suggesting it as a 
requirement: there must be a standard, straightforward transform for 
existing XML namespace identifiers to the pragmatic namespace style.

> Requirement: must support HTML nested inside an extension vocabulary.
> 
> Point 5:
> Unless overridden, HTML documents are treated as if all localnames 
> explicitly listed in the specification are HTML boundary elements.

But you do not say how this is "overridden," unless you mean ...

> <html using.org.w3.www.2002.xforms="model select1 range secret input">
>   <head>
>     <model>...</model>
>   </head>
>   <body>
>     <input>...</input><!-- xforms -->
>     <html..input>...</html..input> <!-- html -->
>   </body>
> </html>

Which probably is what you mean.  Okay.  (Well, except that I've 
changed the example to use namespaces mapped to reverse DNS, and used 
the double-dot for the html "prefix").

Care to hear a suggestion that reeks of SGML minimization, but that 
might make these pragmatic namespaces more acceptable?

Permit an element to be closed without its "prefix".  It's something 
I've wished for in XML, for that matter:

<com.example..project>la, la</project>

I can't imagine a situation in which this is ambiguous.  Even in XML, 
using QNames and prefixes:

<a:element>
  <b:element>
    <c:element></element>
  </element>
</element>

... is perfectly clear and unambiguous; and changes nothing about 
well-formedness.  Call it namespace minimization.  :-)

> In practice, it may be inevitable that browser makers might bake in 
> additional defaults, like
> using.math="math mi mo ms mn mtext"
> such that users can freely use chosen vocabularies with zero 
> additional markup. Support for this outcome is an additional feature 
> of this proposal.

Ick.

That suggests that the "prefix registry" is central to acceptance by 
browser makers, while I find it the least convincing part of the 
proposal.

Amy!
--

-- 
Amelia A. Lewis                    amyzing {at} talsever.com
A hundred thousand lemmings can't be wrong.

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
subscribe: xml-dev-subscribe <at> lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Kurt Cagle | 2 Aug 2009 20:18
Picon

XML Today Back Online

XML Today (http://www.xmltoday.org) is officially back online. I've created the site to act as a forum for the XML community, with searchable aggregated news and job listings, blog posts, an SMF Forum and plans for code repositories. Please feel free to post conference events or related content onto the site as well.

Kurt Cagle
Managing Editor
http://xmlToday.org

Kurt Cagle | 2 Aug 2009 20:55
Picon

Re: Pragmatic namespaces

Although overall I'm in agreement with Micah on the implementation, I still think that there should be a push back on the HTML community for namespace support - or this particular architecture needs to be extended to the XHTML 5 version as well. The danger I see is that you'll continue to see vendors choose only to support the HTML only version (thinking IE8 here specifically) event though both versions should technically be supported, because of the potential of creating lock-in on the default namespace set.

One additional possibility here might be to introduce a new HTML 5 header element called <extends>:

<extends namespace="namespaceURI" prefix="prefix" uses="elementSeq|*">

Yes, this is a rogue namespace. The <at> uses attribute would contain either a list of supported elements or would default to the total set of elements (the default if <at> uses is not included. Elements in this namespace would be used sans prefix identifiers, unless the need to resolve conflicts arose, at which point it would take the form prefix.element. A dublin core element would then be added as:

<extends namespace="http://dublincore.org/documents/dcmi-namespace/" prefix="dc" uses="*"/>

or

<extends namespace="http://dublincore.org/documents/dcmi-namespace/" prefix="dc"/>

Then inline you could have the <creator> or <editor> element being added into the mix. If there was a collision with an extant namespace, then you'd default back to <dc.creator>.


This would be on top of the core arrangement discussed by Micah - in essence, it does not penalize developers who DO need advanced functionality simply because of the Granny Coder argument.

Kurt Cagle | 2 Aug 2009 21:30
Picon

Re: Pragmatic namespaces

Amelia,

The previous post was intended for the list overall (as is this post), the only flaw in mailman type architectures.

However, concerning your post, I agree strongly with you about the need to avoid namespace registries, which is the danger that I see in any "default" mechanism. It's potential to fragment the web is disturbing, especially as it effectively puts the decision about what technologies to keep or avoid solely in the hands of the browser vendors.

Overall, I'm going to raise this question again - what exactly is it about namespaces that the HTML crowd doesn't like? If it's the use of complex namespace URIs, then frankly the ideal solution to that is to provide guidance on what constitutes a good web URI. If it's the requirement of using prefixes, then an extension of Micah's pragmatic namespaces solution seems to be a good start, so long as there is a formal mechanism for insuring that ANY namespace can be introduced in this matter.

However, if it is simply a desire by a group of people (notably the WHATWG group) to control the standard at its most conservative, then nothing that the XML community does, no matter how well intentioned, will make any difference. This becomes a formal W3C matter (which it ultimately should be) - not Google, not Ian Hixie, not any of us here individually ... or has the W3C's focus on the Semantic Web blinded it to the fact that its initial, primary and ultimate mandate was to act as the custodian of the HTML standard?

I'm sorry about being harsh about this, but frankly the whole issue is beginning to piss me off. As far as I'm concerned, by allowing the HTML 5 process to move forward in the first place, there is an open, tacit admission that the SGML DTDs underlying HTML are once again open for modification. Maybe this is the time to incorporate namespaces into the formal DTD, since the DTD emerged before namespaces did. If a different notation is needed for backward compatibility, that's fine, but this unthinking idiocy of feeling that namespaces in some form should not be a part of HTML is just politics for the sake of control.

The language NEEDS an extension mechanism. There are more than 10,000 different XML vocabularies currently in existence at the present time, and HTML is still, far and away, the primary carrier for the bulk of them. The whole AJAX movement has the potential, with XBL or otherwise, to provide behavioral support for those extension elements, as appropriate, and without this philosophy in place, then we just see the unabated movement towards JavaScript becoming a morass of APIs that destroy the whole notion of declarative architectures.

I think we should pursue Micah's proposals, but frankly even at the Extensibility F2F in September it should ... it must ... include an open-ended extensibility model as an absolute minimum requirement ... and that the W3C should decide as a body whether it wishes to control the future of HTML or cede that authority to a handful of vendors. Because if it chooses to cede this point, then for all intents and purposes the XML movement is dead.

Kurt Cagle
Managing Editor
XMLToday.org

Amelia A Lewis | 3 Aug 2009 03:44

Re: Pragmatic namespaces

Well, I'll offer some remarks in response, but I hope that others will 
join the conversation.

On Sun, 2 Aug 2009 12:30:58 -0700, Kurt Cagle wrote:
> However, concerning your post, I agree strongly with you about the need to
> avoid namespace registries, which is the danger that I see in any "default"
> mechanism. It's potential to fragment the web is disturbing, especially as
> it effectively puts the decision about what technologies to keep or avoid
> solely in the hands of the browser vendors.

Strongly agreed.  In that regard, I'm deeply reluctant to accept the 
"shortcuts" (registry) that Micah proposed, because it seems to me that 
these would soon become the only things supported.

Now ... even something like Flash, now so widespread, would have had no 
chance of adoption and uptake without *extensibility* (and I will 
perhaps be excused for emphasizing the word, since I am prouder of 
having worked for the company of that name than of any other).  I will 
grant that IE offers a poor platform for namespace-based (or 
equivalent) extensibility, but it seems to me that in order to enable 
the future of the web, to make it a place where small, dedicated groups 
can introduce something game-changing, that extensibility paradigm is 
of paramount importance.

> Overall, I'm going to raise this question again - what exactly is it about
> namespaces that the HTML crowd doesn't like? If it's the use of complex

I think it's verbosity as much as complexity.  You will note, I hope, 
the "namespace minimization" that I mentioned in my post; why should I 
have to tell the processor the *namespace prefix* of the element that 
I'm closing any more than I should have to repeat the attributes of 
that element?  I'm persuaded that permitting those to be dropped would 
have no impact on well-formedness (although I admit that discussions of 
minimization are likely to lead into a swamp, because well formedness 
and minimization are clearly at odds, in a large number of cases that 
can't be dismissed as "corner").  Arguably, XML's verbosity 
(effectively requiring the equivalent of a comment on every equivalent 
of a closing brace in C } /* if */ } /* while */) is precisely what 
makes it robust enough to have achieved the levels of adoption that it 
has seen.

In terms of verbosity, the idea of using something like XLink rather 
than "href" in attributes (XLink is, in my opinion, vastly 
overspecified/overengineered for the common case, leading to dismissal 
of the opportunities that it provides for more sophisticated usage) is 
equally damning.  And neither DTD nor W3C XML Schema make it easy or 
convenient to say "oh, you can use any attribute from a different 
namespace on any element here".  It's painful in DTD; it's exceedingly 
tedious (and consequently likely to not happen, for at least some 
elements) in WXS.

> namespace URIs, then frankly the ideal solution to that is to provide
> guidance on what constitutes a good web URI. If it's the requirement of

Okay, you've triggered a rant.  Those of you who are partisans of the 
Namespaces in XML Specification are hereby warned: the following will 
annoy you.  I'm going to be offensive (although more offensive to the 
W3C folks who forced URIs onto the Working Group than to the Working 
Group members themselves).

The Namespace in XML specification claims that an XML Namespace 
identifier "is a URI."  However, if you read the rules in the spec, you 
find that "" (the empty string) is permissible (with a special 
meaning), even though it is not legal in any URI specification BNF you 
care to present, and the statement isn't (specification-ly correct) 
"union of URI and the empty string".  And you find that one namespace 
identifier is compared with another via lexical comparison of the 
strings, which is not how one determines URI "identity" (an area 
admittedly underspecified, in my opinion, but the idea that 
"www.ibm.com" != "www.IBM.com" or "www.w3.org != "www.W3.org", when DNS 
is explicitly case-insensitive, is clearly problematic).

Consequently, in an earlier rant to XML-dev, I said that the Namespaces 
in XML specification might be improved by the addition of one word: "is 
a URI" becomes "is not a URI".  Yes, namespace identifiers mimic the 
syntax of URIs, but much of the information carried is discarded.

Can anyone provide an example of a namespace differentiated by scheme?  
I mean, for example:

http://www.example.com/namespace/x
ftp://www.example.com/namespace/y

... where two different namespaces are indicated and distinguished via 
the scheme portion of the URL.  I'd be surprised to see such a thing; 
although it's certainly possible, as soon as I thought of it, I 
labelled it, in my mind, as "exemplar of worst practice," and I suspect 
others would do so as well.  Similarly, distinguishing between 
hierarchical and non-hierarchical schemes is important for URIs, but 
not for namespaces; the indicators that distinguish authority from path 
from query are significant for URIs, but not for distinguishing among 
namespace identifiers produced by a single authority.  URIs contain 
lots of characters that aren't legal as NCName or QName; there's no 
reason that namespace identifiers, carrying equal information about 
authority and distinction of namespace, need do so.

Granted, though http:// (hierarchic HTTP) is the most common form of 
namespace identifier, there are others; the one I've most commonly 
encountered is urn:uuid: (non-hierarchic, but by-design unique, and not 
requiring that someone acquire a domain in order to define a 
namespace).  The latter form of URI does not lend itself to the pattern 
of automatic conversion that I suggested in my previous email, and 
arguably the requirement to have a domain in order to define a 
namespace is establishing a threshold (based on available capital) that 
should not be made.  *shrug*  I don't buy it (but I own several 
domains, personally, and I don't think that <$20/year is a hardship).

The insistence of the W3C on URIs (that aren't, in fact, URIs) as 
namespace identifiers is, to my mind, the worst thing that could have 
happened to XML.  Because the URI specifications are not in the control 
of W3C, and the BNF for URI (however widely ignored in detail) cannot 
drop multiple characters otherwise illegal in XML element names, the 
Namespaces in XML specification was forced to introduce 
namespace-to-prefix mappings, and the subsequent use of prefixes in 
element and attribute content poisoned the well completely.  James 
Clark's (brilliant) suggestion for expanded names, {uri}localname, 
simply never saw adequate adoption (in part, perhaps, because W3C XML 
Schema defined anyURI and NCName and QName, but not ExpName (or JCName 
:-)).

While the XML 1.0 specification is (a thing of beauty and a joy 
forever) one that I point out to others (whenever I am engaged in that 
horrible perversion, specificating, in company :-), the best that I can 
say for Namespaces in XML is "well, yes, that's clear enough; it can be 
implemented."  Whoever forced URIs on the working group--very likely 
TimBL or Roy Fielding or their disciples--did them a disservice.  XML 
namespace identifiers do not need that generality, and should have 
chosen a representation that allowed compact (but unique, with 
distributed authority) indication within an element name.  Perhaps, 
rather than the dotted-on pattern that Micah has proposed, they could 
have made "qualified names" use colons in place of dots in domain names 
and in place of slashes in paths (com:w3:org:1999:xlink:link).  Still 
verbose (Micah's "using" syntax is rather ... nice), but not as painful 
as NiXML.

The fundamental point there: the issue is one of making a distributed 
mechanism in which independent authorities can establish names (and 
namespaces) without the possibility of name clashes.  Consequently: 
scheme is unnecessary.  Authority *is* necessary, and it's best to 
leverage an *existing* registry, preferably one that anyone using 
computers at this level of proficiency can easily join.  Finally, make 
it possible to reference things in foreign namespace *without pain*.  
Micah's proposal does that better than I had thought of doing (because 
of his "threshold" elements, which work like the human brain: once I 
start talking MathML, I'm not done talking MathML until I close the 
element, so *leave me alone and don't make me repeat myself*).

> seems to be a good start, so long as there is a formal mechanism for
> insuring that ANY namespace can be introduced in this matter.

I agree that this is fundamental, and also worry (though I'm not 
familiar with HTML 5 or WHATWG people sufficiently to judge) that the 
intent of HTML 5's restriction of permitted namespaces is for the 
purpose of controlling (that is, limiting) extensibility.

> The language NEEDS an extension mechanism.
> The language NEEDS an extension mechanism.
> The language NEEDS an extension mechanism.

You won't mind if I quote this multiple times, will you?  I can't think 
of a better way to indicate how much importance I accord to it.

> The language NEEDS an extension mechanism.
> The language NEEDS an extension mechanism.
> The language NEEDS an extension mechanism.

See?

I really, really wish that someone from the HTML 5 working group would 
come forward with an indication of what, in the WG's opinion, the fatal 
flaws of XML namespaces are.  I can guess at a number of them (the fact 
that many, many people new to XML cannot understand that elements are 
scoped by the schema (and namespace-qualified) while attributes are 
scoped by the element (and hence unqualified) by default is one; 
verbosity, incomprehensibility ... well, I'm not a big fan of 
Namespaces in XML apart from the rather insipid encomium, "yes, that 
can be implemented"), but ... I'd like to see HTML 5's "non-XML" syntax 
permit a lossless transformation into the XML syntax and back.  It 
doesn't need *XML* namespaces to do that, but it does need ... 
something with the good qualities of Namespaces in XML.

> chooses to cede this point, then for all intents and purposes the XML
> movement is dead.

Oh ... I can't really agree with that.  I mean, I saw that the HTML 5 
working group was defined *in terms of the DOM* and Boggled and Fell 
Down.  Does anyone who does XML for a living have any respect for the 
DOM APIs?  And yet ... it's clear that those are core to the browser 
experience (which is why they suck so hard for any other usage, in my 
opinion), so it's really perfectly reasonable that the browser folks 
should start from the DOM.  In any other application of XML, a mutable 
tree API is at best a dead weight, but in the browser, it has 
utility--utility to the point of necessity.

Bifurcation?  Certainly.  HTML could become a non-XML dialect.  I'd 
hate that, but it looks as though there's at least a part of the HTML 5 
working group who would welcome it.  Killing XML?  Nah.  The HTML 5 
working group can miss an opportunity (and it seems likely that they 
will), but distancing HTML from XML won't kill either one, it will just 
annoy the folks who have to develop the techniques to reconcile them.

Amy!
--

-- 
Amelia A. Lewis                    amyzing {at} talsever.com
Do you ever feel like putting your fist through a window just so you
can feel something?

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
subscribe: xml-dev-subscribe <at> lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

mike | 3 Aug 2009 05:25

Why Multipath (LCA) Hierarchical Query Processing Works Automatically in ANSI SQL

Basic ANSI SQL inherent hierarchical processing using the Left Outer Join to model and processes hierarchical structures is basically quite obvious and empirical and I have covered this in my previous SQL/XML articles. The proof for multipath hierarchical query processing which requires Lowest Common Ancestor (LCA) processing occurring naturally in ANSI SQL is not that obvious because it is truly quite amazing since it was never designed into ANSI SQL and is quite complex to perform. Empirically it can be proven from its results, but it would be very nice to know how and why it is working so we can absolutely trust the results. I have written an article describing the how and why of natural LCA processing in ANSI SQL and have appropriately entitled it “The Ghost in the Machine”. It can be located below.

The Ghost in the Machine
http://www.tdan.com/view-articles/11069 

This LCA processing in XQuery is not automatically performed today and is too complex to do with procedural navigation. This problem has been researched academically and attempted solutions use LCA functions that have to be inserted correctly by the query user which takes away for its ease of use and schema-free purpose. My work with LCA processing has shown that LCA processing can involve nesting LCA’s that I do not necessarily see occurring in this LCA XQuery research limiting their future solutions to more simple queries. ANSI SQL performing LCA processing automatically has no multi-path LCA query limitations. This has been referred to a LCA query processing, at least three decades ago.

 Regards,
                  /Mike

Michael M David

Advanced Data Access Technologies, Inc.

www.adatinc.com

rjelliffe | 3 Aug 2009 06:07
Picon

Re: Pragmatic namespaces

Kurt Cagle wrote:

> However, if it is simply a desire by a group of people (notably the WHATWG
> group) to control the standard at its most conservative, then nothing that
> the XML community does, no matter how well intentioned, will make any
> difference. This becomes a formal W3C matter (which it ultimately should
> be)
> - not Google, not Ian Hixie, not any of us here individually ... or has

Unless a broad variety of people participate at W3C, it can be taken in
any direction. Like almost any standards body, participation is the key.

> I'm sorry about being harsh about this, but frankly the whole issue is
> beginning to piss me off. As far as I'm concerned, by allowing the HTML 5
> process to move forward in the first place, there is an open, tacit
> admission that the SGML DTDs underlying HTML are once again open for
> modification. Maybe this is the time to incorporate namespaces into the
> formal DTD, since the DTD emerged before namespaces did. If a different
> notation is needed for backward compatibility, that's fine, but this
> unthinking idiocy of feeling that namespaces in some form should not be a
> part of HTML is just politics for the sake of control.

The syntax of DTDs is not immutable. If the HTML groups would like a form
of DTDs with namespace-awareness, the SGML standard could be changed, as
it was to accomodate XML. Indeed, there already is a specification for
namespace-aware DTDs, prepared as part of ISO DSDL. You can read a draft
at
http://www.dsdl.org/dsdl-9-rev061103.pdf

> The language NEEDS an extension mechanism. There are more than 10,000
> different XML vocabularies currently in existence at the present time, and
> HTML is still, far and away, the primary carrier for the bulk of them. The
> whole AJAX movement has the potential, with XBL or otherwise, to provide
> behavioral support for those extension elements, as appropriate, and
> without
> this philosophy in place, then we just see the unabated movement towards
> JavaScript becoming a morass of APIs that destroy the whole notion of
> declarative architectures.

I think we need to consider the difference between server-side and
client-side extensions. The XHTML/namespace mechanism seems fine for
server-side extensions, processed at the server. But it has not thrived
for the client-side.

Also, there is in my mind a clear difference between vocabularies that add
different functionality in branches (e.g. MathML, SVG, etc) and
vocabularies that decorate or enhance or interleave with existing HTML
(e.g. RDFa). The former unignorable, heavywieght and handled by plugins 
and the latter ignorable, lightweight and handled by the normal HTML
mechanism (no namespaces).

A great example of this are the ruby text elements. These are inline,
above phrase, annotations to help with pronunciation or meaning or
abbreviation, used primarily in Japanese text, which has a lot of
homophones or variant readings.

Ruby seemed to be the kind of thing that namespaces would be good at, but
namespaces would add so much extra markup to an already complex structure
(by HTML standards) that namespaces were not feasible. But it was desired
to keep them out of the base HTML standard, rather leaving them in a
limbo.

So I would suggest that at least some of W3C's problems with namespaces
are of their own creation (without implying blame or second-guessing
them): HTML 4 was in effect frozen where it would have been better to let
it continue to evolve. So namespaces became the only tool for evolution,
and since namespaces were never intended to be the only tool in the
toolbelt (i.e. vocabularies continue to evolve even within a namespace) it
is no surprise they are deficient. HTML 5 addresses this backlog.

> Because if it
> chooses to cede this point, then for all intents and purposes the XML
> movement is dead.

Calm down Kurt, it would just shift venues. If there is a market
requirement for it, and if W3C is not interested, then people can ask SC34
to put out an ISO XML. But don't forget that there are other XML-using
groups in W3C: SVG, MathML etc. The HTML WG is not the only voice in the
marketplace. (And it is good to have a variety of different voices, even
if some alarm us!)

Cheers
Rick Jelliffe

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
subscribe: xml-dev-subscribe <at> lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


Gmane