Re: WikiDom serializers
Gabriel Wicke <wicke <at> wikidev.net>
2011-11-07 18:59:19 GMT
Hello Trevor and list,
since last week's integration in the VisualEditor extension, things are
progressing well. Lists including definition lists and tables are parsed to
WikiDom and rendered in the HTML serializer. The parser is still quite rough
and in flux at this stage, but the general structures mostly work when
running the parser tests using node.js. The Wikitext serializer is not yet
wired up as I currently concentrate on the parser and its WikiDom output,
but will be added for round-trip testing.
Apart from general grammar tweaking I am now working on a conversion of
inline elements into WikiDom annotations. The main challenge is the
calculation of plain-text offsets. I am trying to avoid building an
intermediate structure, but might fall back to it if things get too messy
when interleaving this calculation with parsing.
>> - es.AnnotationSerializer needs some nesting smartness, so
>> that overlapped regions open and close properly (<b>a<i>b</b>c</i>
>> should be <b>a<i>b</i></b><i>c</i> - es.ContentView does this
>> correctly but is working from the linear data model)
Parsing these overlapped annotations is not supported too well right now,
but should be doable using a multi-pass or shallow (token-only) parsing
strategy for inline content. Pushing nesting and content model fix-ups to
the serializer should also make it easier to approximate the parsing rules
in the HTML5 specification  without forcing too much normalization.
Mostly, the HTML5 parsing spec is a bit more systematic version of what tidy
does right now after the MediaWiki parser has tried its best.
Some early fix-ups seem to be needed to allow proper editing in particular
of block-level elements, so I am currently a bit sceptical about avoiding
>> - We need some sort of context that can be asked for the HTML of a
>> template, whether a page exists, etc. Initially this work is all done
>> on the client, which means this is a wrapper for a lot of API calls,
>> but either way, having a firm API between the renderer and the site
>> context will help keep things clean and flexible
Brion already implemented a simple context object for transclusion tests.
This is probably not yet the final API, but already a good start.
: HTML5 parsing spec: http://dev.w3.org/html5/spec/Overview.html#parsing