Re: YAML syntax issues (my proposals for YAML 1.2)
Ingy dot Net <
ingy@...>
2006-03-01 06:10:38 GMT
On 28/02/06 21:58 +0200, Kirill Simonov wrote:
> I repost this message, sorry if you get it twice.
>
>
> First I'd like to thank everybody for the support I got here and on
> #yaml.
Well thanks for rocking harder than Pat Benetar!
> There are some issues with the YAML grammar that I want to discuss here.
Cool. I want to make a few points before we dive into spec changing waters.
1) As it stands today, all changes to the spec require COI-Consensus.
That is full agreement between Clark, Oren, and Ingy. Usually this is
accompanied by some quorum of community consensus, but not always.
2) As one of the 2.5 Major Implementors your voice has major sway. Much
more sway than in the past, since YAML implementations now hold a
weight of their own.
3) JSON is here to stay and is a Good Thing(tm). JSON cannot replace
YAML, because it lacks things like types, references and human
readablility. Still it is great for light cross language data
exchange. Supporting JSON will serve YAML well.
4) YAML is in practical use *much* more than when the original
decisions were made in the dark. We have implementations, use cases
and real users to weigh against now. Now is more a time for
pragmatism than idealism.
5) Someone needs to actually make the spec changes. In the past this has
been Oren, but I think we can safely say he is distracted :) I want
to open up the spec repository to people interested in helping
maintain it. Oren can still have ultimate veto power if he wishes,
but he doesn't need to bottleneck it.
Clark, Oren... any disagreement here? Anything to add. Oren, you might
need to revert from a pumpkin.
> 1. Should indentation be forced for flow collections? The current spec
> requires flow collections to be indented more then the parent block.
> This makes even legitimate-looking documents like
>
> ---
> simple key: [
> flow, sequence
> ] # <--
>
> to be ill-formed.
>
> The same question can be asked for this example:
>
> ---
> block:
> another block: [
> flow, collection]
>
> I see the following advantages of allowing such syntax:
> - syck compatibility: syck allows it.
> - Python (a language that has a similar block/flow syntax) allows it.
> - for the scanner, this restriction looks artificial since indentation is
> not needed for parsing flow collections. Thus removing this restriction
> will make the scanner more natural.
>
> On the other hand:
> - it's ugly.
> - noone will really use such syntax. Most likely this indicates a syntax
> error somewhere (unclosed '[' or '{'). Allowing it will make error
> messages more confusing.
>
> So I'm not sure what the best solution is.
I think the answer here is to be lax in what we parse and strict in what we
emit. Emitters should strive to emit in the old style.
> 2. The same question and the same reasons are applicable for quoted
> scalars:
>
> ---
> block:
> block: 'quoted
> scalar'
>
> Again indentation is not needed for the scanner so such syntax can be
> permitted.
Same answer I think...
> 3. The spec requires scalars to be indended with at least one space. It
> seems this rule was introduced to make the check for '---' and '...'
> indicators easier for the cases like
>
> ---
> "quoted scalar
> ... <-- invalid indicator"
>
> But it means that a user is forced to write
>
> ---
> "quoted
> scalar"
>
> which may not look nice. I don't mind to add the check for '---' and
> '...' at the beginning of a line, so I think this restriction can be
> removed. Note that syck does not require it either, although it does not
> check for the indicators.
I defer.
> 4. Tab rules are confusing. Well, it's natural since tabs themselves are
> confusing, but I believe the rules are more confusing than necessary.
> I would like to forbid tabs completely, but it seems it's not an option.
> So where is the confusion? It's explained by this example (tab
> is denoted by '^'):
>
> --- # ill-formed document (understandable)
> - ^multi line
> scalar
> --- # again ill-formed document (why?)
> - multi^line
> scalar
> --- # well-formed document
> - multi line^
> scalar
> --- # again well-formed document (hmm...)
> - multi line
> ^scalar
> --- # ill-formed document (?!)
> - multi line
> scalar^
>
> I may be wrong in my interpretation of the production rules though,
> so the above attributions could be wrong. Anyway I'd like to use the
> following rule: tabs cannot be used before block indicators ('-', '?',
> ':', and simple keys) and cannot denote the end of block and plain
> scalars. Well, something like this; I'm not sure that this rule is
> complete or correct. But I'd like to have a rule that can be explained by
> a single sentence.
I wouldn't mind forbidding tabs in every case but the literal block scalar.
The use case is that you can indent any (printable) block of text and
make it literal by slapping
foo: |
In front of it.
> 5. In the flow context, ':' and ',' should not be allowed for plain
> scalars and should always be separators. This means that the document
>
> --- [1:2,3:4]
>
> is the same as
>
> --- [ 1 : 2 , 3 : 4 ]
>
> It was already discussed here, I added it for completeness. The
> following two issues assume that this rule is implemented.
I think it must be that way.
I don't see people writing many flow collections by hand anyway.
> 6. 'ns-anchor-name' is too greedy:
> ns-anchor-name ::= ns-char+
>
> It may cause confusion for the documents like
>
> --- { &A 1,*A,1 }
>
> I want it to be restricted to 'ns-word-char+'. By the way, why '_' isn't
> included to 'ns-word-char'? It can be useful for identificators.
+1 (on both counts)
'_' is also a word char in all regexp implementations.
> 7. I'd like to forbid empty plain scalars if the anchor or tag is
> specified. For instance, I want to make the following example
> ill-formed:
>
> ---
> block key: !tag
> ---
> { !perl/A::Package }
>
> For the latter example, it's not clear if it is
> { !perl/A::Package '' }
> or
> { !perl/A: '' : 'Package' }
> This is the reason why I want this rule.
Does this prevent us from using ':' in a tag in a flow collection?
> 8. 'y' and 'n' should be removed from the boolean constants. It was
> already discussed on #yaml.
+1
> That's all so far.
Cool. I will try to get all the spec components checked into
http://svn.yaml.org/spec/ asap. After that I will give commit access to
interested parties.
Cheers, Ingy
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642