Gabriel Wicke | 2 Apr 2012 10:56
Favicon

Re: Fwd: [Wikitech-l] Cutting MediaWiki loose from wikitext

On 03/26/2012 10:56 PM, Daniel Kinzler wrote:
> On 26.03.2012 22:28, Amgine wrote:
>> Are we talking about mime types for articles?
>
> Yes, pretty much. Though technically, the mime type would only describe the
> serialization format (e.g. "application/json"), not the data model (e.g.
> "wikidata entity record") - both bits of information are needed. But
> essentially, yes: pages will have types, and types have handlers for displaying,
> editing, etc.

+1 for making serialization / data model information explicit. Parsoid 
is also structured around per-input mime types, although currently only 
a generic 'text/wiki' placeholder type is implemented. Each input type 
has a specific parser pipeline associated with it, which eventually 
produces tokens (at this stage mostly synonymous with HTML tags) 
independent of input type for the last, shared token transformation phase.

There is no distinction between processing for displaying vs. editing 
currently, as we try to preserve all relevant information for editing 
using structured data (leaning towards RDFa) and attribute annotations 
in a displayable DOM. We try to support schema-like information for 
template editing. There might be a way to accommodate schema-like 
information for Wikidata information using the same setup.

Mime types similar to those described for XML in RFC 3023 could be used 
for JSON-serialized data. Maybe something like 
application/wikidata+json? The syntax looks a bit backwards to me, but 
at least there is the XML precedent, and anything with +json suffix can 
be handled as generic JSON when the specific data model is not known.

(Continue reading)

Gabriel Wicke | 2 Apr 2012 11:24
Favicon

Re: Template contains list items, but no start tag?

On 03/26/2012 05:31 PM, Trevor Parscal wrote:
> I may not be understanding your question very well, but I this might help.
>
> List items that come from different sources will be treated as separate
> lists in the editor (at least initially) even if they are rendered as a
> contiguous list in the final view. We may add some slick features to the
> editor to blend the native and generated portions of the document
> together better, but that will be something we work on adding down the road.

IMHO it would make sense to treat lists with items coming from templates 
similar to tables produced by table start / row / end templates- as 
something partially constructed from templates. We can mark both tables 
and lists as being composed from templates in the parser, which should 
make it quite easy to initially protect those lists as opaque blobs in 
the editor, until advanced support for mixed-source structures is 
implemented.

Gabriel
Ori Livneh | 2 Apr 2012 19:27
Picon
Gravatar

Re: Patch for parser

Awesome -- thanks!

I have a couple of questions to ask, too. These aren't urgent, so feel free to reply whenever you have the time.

1) Conformance vs. enhancement

I'm not sure if you saw my comment in the diff, but some of the ISBN validation code I wrote is deliberately unreachable because it's stricter than the naive validation the Mediawiki currently performs. (I stupidly wrote it before checking to see what exactly the current parser does.) In general, is it ever desirable to try and improve on the old parsing grammar or is total conformance the overriding goal?

2) ECMAScript target

Should I strive to write code that is compatible with older browsers? Thus far I've avoided ES 1.5 constructs like forEach, map, filter, etc. But if this is only going to run server-side under node for the foreseeable future, maybe that's an unnecessary handicap. 

3) Intermediate representation vs. parser hacks

The preliminary work I've done to parse behavior switches (__TOC__ & friends) has the parser directly toggle attributes on a configuration object. It does this rather than produce a token for some subsequent tree visitor to interpret. So the parsing is arguably somewhat lossy in this case, but possibly not: when serializing back wikitext, you could read all behaviors from the configuration object and encode them at the top; their position isn't significant, afaik.)

So: did I make the right decision? PEG.js's support for arbitrary JavaScript code in the grammar makes this tempting to do sometimes, but perhaps it's wiser to insist that the parser not overreach the goal of building a tree. What is your take?

I'm CCing wikitext-l, since your responses are likely to be useful to future contributors.

Best,
Ori

On Mon, Apr 2, 2012 at 12:41 PM, Gabriel Wicke <gwicke <at> wikimedia.org> wrote:
Hi Ori,

I committed your patch in https://gerrit.wikimedia.org/r/#change,4094. Looks good, and lets 14 more tests pass ;) Thanks!!

Gabriel


On Mon, Apr 2, 2012 at 9:59 AM, Ori Livneh <ori.livneh <at> gmail.com> wrote:
Hey Gabriel,

Working on this on stolen time, as it were, but I do have some modest progress to report. The attached patch revises the handling of RFC auto-links and adds support for ISBNs, PMID links, and preliminary support for behavior switches (things like __TOC__).

I applied for git access, so hopefully I'll have my own branch set up soon and we'll be able to do this in a slightly less haphazard way

Hope you've had a nice trip,

Ori


_______________________________________________
Wikitext-l mailing list
Wikitext-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Gabriel Wicke | 3 Apr 2012 13:12
Favicon

Re: Patch for parser

On 04/02/2012 07:27 PM, Ori Livneh wrote:
> Awesome -- thanks!
> 
> I have a couple of questions to ask, too. These aren't urgent, so feel
> free to reply whenever you have the time.
> 
> 1) Conformance vs. enhancement
> 
> I'm not sure if you saw my comment in the diff, but some of the ISBN
> validation code I wrote is deliberately unreachable because it's
> stricter than the naive validation the Mediawiki currently performs. (I
> stupidly wrote it before checking to see what exactly the current parser
> does.) In general, is it ever desirable to try and improve on the old
> parsing grammar or is total conformance the overriding goal?

Breaking commonly-used wikitext constructs would need a very good
justification, while enhancements / clean-ups for little-used edge cases
are a good idea. I created a dumpGrepper tool (see
VisualEditor/tests/parser/dumpGrepper.js) to check how common some
constructs are.

For ISBN in particular, the following bug discussed the trade-offs
involved in stricter validation:
https://bugzilla.wikimedia.org/show_bug.cgi?id=2391
There is now strict validation and a warning in Special:Booksources, but
the parser still recognizes invalid ISBNs.

> 2) ECMAScript target
> 
> Should I strive to write code that is compatible with older browsers?
> Thus far I've avoided ES 1.5 constructs like forEach, map, filter, etc.
> But if this is only going to run server-side under node for
> the foreseeable future, maybe that's an unnecessary handicap. 

We don't have concrete plans to run this code on old clients, and by the
time that would happen most browsers might be recent enough. Performance
is important though, and simple constructs like for / while loops were
still faster on V8 last time I checked.

> 3) Intermediate representation vs. parser hacks
> 
> The preliminary work I've done to parse behavior switches (__TOC__ &
> friends) has the parser directly toggle attributes on a configuration
> object. It does this rather than produce a token for some subsequent
> tree visitor to interpret. So the parsing is arguably somewhat lossy in
> this case, but possibly not: when serializing back wikitext, you could
> read all behaviors from the configuration object and encode them at the
> top; their position isn't significant, afaik.)

The position of the switch is still significant to avoid dirty diffs.
Adam Wight actually implemented a bit of the back-end infrastructure for
this in https://gerrit.wikimedia.org/r/#change,4050. We should be able
to combine the two patches by joining your tokenizer work with Adam's
token transformer work.

> So: did I make the right decision? PEG.js's support for arbitrary
> JavaScript code in the grammar makes this tempting to do sometimes, but
> perhaps it's wiser to insist that the parser not overreach the goal of
> building a tree. What is your take?

The tokenizer should try hard to produce something round-trippable while
still trying to perform most context-free parsing if it can. It does
build a parse tree in the process, but then immediately flattens it by
emitting a stream of tokens. In this case we should emit tokens for the
behavior switches, which are then handled in a token stream transformer
(see Adam's patch).

I plan to port the wikitext serializer in the editor to operate on
tokens instead, so that we can do round-trip testing at different levels
in the processing chain. Serializing a DOM subtree is then equivalent to
walking the tree and calling the start / end token serializers for each
DOM node. Behavior switches thus need to be represented as (invisible)
nodes in the DOM as well to support round-tripping via the DOM. HTML5
Microdata would suggest the meta element for this, while I am not 100%
sure which would be the preferred element for RDFa. Perhaps a span
without text content. The mapping from internal token to meta / span can
be performed in the token stream transformer. We should standardize on a
single way to mark up invisible content to simplify serialization.

Gabriel
Tom Roche | 10 Apr 2012 21:39
Picon
Favicon

offline export of wikitext to slide-like presentation format?


Is there a tool (or a chain) that would allow one to input wikitext
and output either

* something slidelike (ODP or PDF, suitably formatted)
* something that easily converts to something slidelike (e.g., 
  TeX beamer, HTML for Slidy)

Why I ask:

I have several long pages on a governmental MediaWiki that I used as
rough drafts for a presentation, which (after favorable review) I'd
like to turn into "real slides." Unfortunately the admins of that MW
are notoriously savage, so my chances of getting something installed
on the wiki instance, e.g. the mw-slidy extension

http://www.mediawiki.org/wiki/Extension:Mw-slidy

, are negligible (though I'll ask). Hence I probably need something
that converts offline (i.e., not on a MediaWiki); I seek, e.g., a
commandline tool such that I can do something like

$ magic < input.mediawiki > output.pdf

One option is Eclipse Mylyn's WikiText module

http://wiki.eclipse.org/Mylyn/Incubator/WikiText

but that seems rather heavyweight for this task, since I'm not
currently using Eclipse for anything else. I'd prefer something like
Deplate

http://deplate.sourceforge.net/

but that only inputs the markups={rdoc, viki}, which both seem quite
remote from (MediaWiki's) wikitext: i.e., creating a wikitext ->
{rdoc, viki} converter seems like more work than writing HTML or TeX
by hand.

Am I missing something? Care to recommend a candidate app, or
Something Completely Different?

Your suggestions are appreciated, Tom Roche <Tom_Roche <at> pobox.com>
Gabriel Wicke | 10 Apr 2012 21:56
Favicon

Re: offline export of wikitext to slide-like presentation format?

On 04/10/2012 09:39 PM, Tom Roche wrote:
> 
> Is there a tool (or a chain) that would allow one to input wikitext
> and output either
> 
> * something slidelike (ODP or PDF, suitably formatted)
> * something that easily converts to something slidelike (e.g., 
>   TeX beamer, HTML for Slidy)

I used pandoc [1] in the past with good results on basic wikitext. It
supports S5 HTML slide shows, TeX (including beamer), pdf via pdflatex
and even docx/ODT. It does not however support all MediaWiki syntax.

Gabriel

[1]: http://johnmacfarlane.net/pandoc/
Jonas Brekle | 16 Apr 2012 20:27
Picon
Gravatar

Userscripts for the VisualEditor?

Hi,

I have not really looked into the code yet, but I just wanted to get
your oppinions on the question: Will there be a possiblity to integrate
userscripts or other extensions into the VisualEditor? Particulary i
would be interested in reusing the Parsoid and retrieve something like
"get parent heading from cursor"... I hope that makes sense :)

Regards 
Jonas
Trevor Parscal | 16 Apr 2012 20:33
Picon
Gravatar

Re: Userscripts for the VisualEditor?

Jonas,


The short answer is yes, you will be able to extend VisualEditor with user scripts. Gadgets and extensions will also be able to extend VisualEditor. The way this will work is not totally set in stone, and won't be for a couple more months, but we have plans to not only provide an API but also use that API for the features that we implement, to ensure it will be robust and capable.

Rob Moen is in charge of this area, so he will be the best person to talk to as this comes together. He's on the list now, so you can just continue the conversation here and he will get your messages.

- Trevor

On Mon, Apr 16, 2012 at 11:27 AM, Jonas Brekle <jonas.brekle <at> gmail.com> wrote:
Hi,

I have not really looked into the code yet, but I just wanted to get
your oppinions on the question: Will there be a possiblity to integrate
userscripts or other extensions into the VisualEditor? Particulary i
would be interested in reusing the Parsoid and retrieve something like
"get parent heading from cursor"... I hope that makes sense :)

Regards
Jonas


_______________________________________________
Wikitext-l mailing list
Wikitext-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

_______________________________________________
Wikitext-l mailing list
Wikitext-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Jonas Brekle | 16 Apr 2012 21:23
Picon
Gravatar

Re: Userscripts for the VisualEditor?

nice!

so there isnt even a unstable API yet, right?.
I wouldnt mind some API changes if this would spare me a much harder
implementation on my own.
Any hints on how to do what I explained, the easiest way?

2012/4/16 Trevor Parscal <tparscal <at> wikimedia.org>:
> Jonas,
>
> The short answer is yes, you will be able to extend VisualEditor with user
> scripts. Gadgets and extensions will also be able to extend VisualEditor.
> The way this will work is not totally set in stone, and won't be for a
> couple more months, but we have plans to not only provide an API but also
> use that API for the features that we implement, to ensure it will be robust
> and capable.
>
> Rob Moen is in charge of this area, so he will be the best person to talk to
> as this comes together. He's on the list now, so you can just continue the
> conversation here and he will get your messages.
>
> - Trevor
>
> On Mon, Apr 16, 2012 at 11:27 AM, Jonas Brekle <jonas.brekle <at> gmail.com>
> wrote:
>>
>> Hi,
>>
>> I have not really looked into the code yet, but I just wanted to get
>> your oppinions on the question: Will there be a possiblity to integrate
>> userscripts or other extensions into the VisualEditor? Particulary i
>> would be interested in reusing the Parsoid and retrieve something like
>> "get parent heading from cursor"... I hope that makes sense :)
>>
>> Regards
>> Jonas
>>
>>
>> _______________________________________________
>> Wikitext-l mailing list
>> Wikitext-l <at> lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
>
>
> _______________________________________________
> Wikitext-l mailing list
> Wikitext-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
Sumana Harihareswara | 18 Apr 2012 02:11
Picon
Gravatar

previous WYSIWYG editor: InlineEditor

I was made aware of work done on an "inline editor" -- see
http://www.mediawiki.org/wiki/Extension:InlineEditor and code at
http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/InlineEditor/ .

This work was done by a team coordinated by GRNET (Greek Research
Network), which developed a WYSIWYG editor that produces MediaWiki syntax.

I presume that the Visual Editor team (Wikimedia Foundation and Wikia)
already know about this work as a predecessor to their own, but just
wanted to share it in case it's useful to anyone else while they wait
for the Visual Editor.

--

-- 
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

Gmane