Andreas Jonsson | 1 Sep 2010 09:47
Picon

Re: Image links

2010-08-31 21:33, mrthetooth skrev:
> I images in captions all the time actually for situations like "The yellow circle
[[Image:circle-yellow.png]] on this map indicates..."
>    
Ok, I'll add support for one nesting level for images.

/Andreas

> -----Original Message-----
> From: wikitext-l-bounces <at> lists.wikimedia.org
[mailto:wikitext-l-bounces <at> lists.wikimedia.org] On Behalf Of Platonides
> Sent: Tuesday, August 31, 2010 11:54 AM
> To: Wikitext-l
> Subject: Re: [Wikitext-l] Image links
>
> Andreas Jonsson wrote:
>    
>> Is there any known use for putting an image inside an image caption,
>> or is the restriction I propose here sufficient?
>>      
> Doesn't seem like an important feature. Although I'm pretty sure that
> someone would be using it. Like using an image to note the language, an
> IPA logotype...
>
> _______________________________________________
> Wikitext-l mailing list
> Wikitext-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
> _______________________________________________
(Continue reading)

Mark Clements (HappyDog | 2 Sep 2010 15:15
Picon

Re: On parsing tokenized wikitext

"Andreas Jonsson" <andreas.jonsson <at> kreablo.se> wrote in message 
news:4C72738C.80704 <at> kreablo.se...
> * The call to the "begin" method is delayed until some actual inlined
>   content is produced.  The call is never taken if an "end" event is
>   recieved before such content.

Does this mean that constructs such as <span id="JSPlaceholder"></span> are 
obliterated by the lexer?  Some empty inline (and block) elements may have 
an important purpose as a JS DOM hook, and should not be removed from the 
output stream.

- Mark Clements (HappyDog) 
Andreas Jonsson | 2 Sep 2010 18:31
Picon

Re: On parsing tokenized wikitext

2010-09-02 15:15, Mark Clements (HappyDog) skrev:
> "Andreas Jonsson"<andreas.jonsson <at> kreablo.se>  wrote in message
> news:4C72738C.80704 <at> kreablo.se...
>    
>> * The call to the "begin" method is delayed until some actual inlined
>>    content is produced.  The call is never taken if an "end" event is
>>    recieved before such content.
>>      
> Does this mean that constructs such as<span id="JSPlaceholder"></span>  are
> obliterated by the lexer?  Some empty inline (and block) elements may have
> an important purpose as a JS DOM hook, and should not be removed from the
> output stream.
>    

Yes, that is correct.  This is what the original parser does for <i> and 
<b>.  But now when you mention it, I realize that this is probably just 
an artefact of cleaning up the apostrophe mess.

I changed it so that inlined empty html elements are always included.

/Andreas
James Salsman | 4 Sep 2010 01:43
Picon
Gravatar

LIME php parser

I would like to use LIME -- http://sourceforge.net/projects/lime-php/
-- instead of a series of regular expression replacement statements to
convert GIFT -- http://microformats.org/wiki/gift -- to the Quiz
Extension format, both for purposes of maintainability and
readability.  However, I am concerned about the code review situation,
and am not sure if it is reasonable to expect to depend on what would
be a much more difficult code review.

On the other hand, LIME has been stable for years, has two good
reviews and the author seems reasonable:
http://c2.com/cgi/wiki?IanKjos

The included calculator example included is easily accessible, all my
experiments with it so far have gone well, and I love the fact that it
includes an option for native code compilation of inner loop code
(lemon.c) but I am interested in using it to populate larger data
structures and how it behaves in production PHP. Does anyone know
anyone else who has used it?

My understanding is that some subsets of the wikitext parser could
easily be converted to a more formal grammar while others need to
remain in PHP (e.g., transclusion), and I am familiar with many of
wikitext's parsing ambiguity conflicts.  I am not an expert in how to
resolve such conflicts in LALR(1) grammars -- although I can squeak
through the trial-and-error process.  However, I am absolutely certain
that moving wikitext parsing to a formal grammar would provide some
serious opportunities for engineering improvements, also in
maintainability, readability, and related efforts.

Therefore, I am considering submitting LIME for code review, but I
(Continue reading)

Jan Paul Posma | 6 Sep 2010 00:20
Picon
Gravatar

InlineEditor extension

Hello,

In commit 72458 I've added the InlineEditor extension. [1] This extension is a working implementation of the prototype(s) earlier posted on this list. It's not actually for use on live wikis, but more a proof-of-concept and framework to experiment with. I will explain the extension in detail for those of you who might be interested.

== Design overview ==
The extension exists of several parts, structured in sub-directories like the UsabilityInitiative extension. The InlineEditor extension itself provides a framework for different edit modes to build on. It displays the edit modes, provides an interface to mark editable pieces of wikitext, provides a client-side inline editor which the edit modes *may* use, is configurable with several fallback options to the full/traditional editor, and handles previewing, publishing, undo and redo.

Every other extension provides an edit mode for the InlineEditor extension. They hook into InlineEditorMark and InlineEditorDefineEditors. The first one is called whenever wikitext is passed through the extension, and all edit modes can mark their editable pieces. Once this is done, a few algorithms will combine this with information of previously edited pieces, generate both wikitext to run through the parser, and JSON which is passed to the client, which maps the editable pieces to the original wikitext. The other hook is to include CSS, JS and messages to the page.

== Limitations ==
There are many things which are sub-optimal right now:
* The editor is slow. Whenever changing a small element and previewing it, the entire page is reparsed. This will be fixed by parsing only the element if possible (i.e. references have side effects at the bottom of the page).
* It's for now only possible to use the editor as primary editor, with a link to the full/traditional editor. There will be a configuration option whether to do this, or display a message at the top of the traditional edit page to switch to this editor.
* I've not tested things in older browsers (or IE at all, for that matter). I only know it runs fine in Firefox and Chrome, but it may have bugs in other browsers.
* The edit modes are really, really, basic right now. They may or may not screw things up. Most of them have just one or a few regular expressions which do well in general, but may fail at many edge-cases.
* The editor may not handle all the messages and edge cases of the traditional editor. 
* The extensions is written for MediaWiki 1.16 but may or may not work with other versions.

Also, I'm not sure at all whether the current set of edit modes is the way to go. Currently, they are mutually exclusive. Meaning that text marked by one editor is never included in text marked by another editor. However, maybe it's better to not have edit modes like this, but different granularity of editing. I.e. sentence => paragraph => block. This way the user will get familiar with more wikitext instead of always seeing small portions. The framework currently doesn't allow for overlap in markings, but I will work to make this possible.

== Goals ==
Goal of this extension is to provide a framework to easily play with different modes of editing in-line. Feel free to write extensions that use this framework, or help with the framework itself. Any usability or technical suggestions are also welcome!

I hope to get some documentation up on mediawiki.org anytime soon, but note that the code is heavily documented inline. Feel free to ask any questions: I'm probably forgetting to mention some things that may not be clear to everyone. Also, there is no public wiki at the moment to test this extension with, will work on that, but if someone else can enable it on a test wiki that would be great too!

To install the extension(s), check the instructions in /trunk/extensions/InlineEditor/InlineEditor.php. Thanks for your time reading this!

Regards,
Jan Paul

_______________________________________________
Wikitext-l mailing list
Wikitext-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Andreas Jonsson | 6 Sep 2010 23:46
Picon

PHP wrapper library for libmwparser available.

I have just commited an initial version of a php wrapper library for
my parser.

http://svn.wikimedia.org/svnroot/mediawiki/trunk/parsers/libmwparser

An example of how it can be used:

   include("mwp.php");
   $istream = MWParserOpenString("input", "<strong id=hello>Hello 
World!", MWPARSER_UTF8);
   $parser = new_MWPARSER($istream);
   $out = MWParseArticle($parser);
   print implode($out). "\n";
   MWParserCloseInputStream($istream);
   $istream = MWParserOpenString("input", "{|\n|[[Hello|hello world!]]", 
MWPARSER_UTF8);
   MWParserReset($parser, $istream);
   $out = MWParseArticle($parser);
   print implode($out). "\n";

which gives the following output:

<p><strong id="hello">Hello World!</strong></p>
<table><tbody><tr><td><!-- BEGIN INTERNAL LINK [Hello] -->hello 
world!<!-- END INTERNAL LINK --></td></tr></tbody></table>

As you can see, I haven't sorted out the internal link resolution yet.
But there is an efficient solution to this: make the database lookup
after the lexer has run, before the parser runs.  This is possible
as all internal links are already known at that stage, and it would
enable the parser to generate the links directly without any
postprocessing.

Since it doesn't completely replace the current parser, it will take a
bit of surgery to insert it into an instance of MediaWiki.  I haven't
tried this yet.

There is a lot of tedious work left to do before everything is
completed.  For instance, a large part of Sanitizer.php must be ported
over to C in order to validate the html attributes.

Best regards,

/Andreas
Mark Clements (HappyDog | 7 Sep 2010 01:54
Picon

Re: On parsing tokenized wikitext

"Andreas Jonsson" <andreas.jonsson <at> kreablo.se> wrote in message 
news:4C7FD17A.7000906 <at> kreablo.se...
> 2010-09-02 15:15, Mark Clements (HappyDog) skrev:
>> "Andreas Jonsson"<andreas.jonsson <at> kreablo.se>  wrote in message
>> news:4C72738C.80704 <at> kreablo.se...
>>
>>> * The call to the "begin" method is delayed until some actual inlined
>>>    content is produced.  The call is never taken if an "end" event is
>>>    recieved before such content.
>>>
>> Does this mean that constructs such as<span id="JSPlaceholder"></span> 
>> are
>> obliterated by the lexer?  Some empty inline (and block) elements may 
>> have
>> an important purpose as a JS DOM hook, and should not be removed from the
>> output stream.
>>
>
> Yes, that is correct.  This is what the original parser does for <i> and
> <b>.  But now when you mention it, I realize that this is probably just
> an artefact of cleaning up the apostrophe mess.
>
> I changed it so that inlined empty html elements are always included.
>

That sounds sensible.  Any HTML inserted manually should be left in place
(possibly tidied - e.g. addition of closing tags - but not removed).  It's
only the generated HTML that should (arguably) be cleaned up in this way.
If the user doesn't want the empty tag, then they can edit the page to
remove it.

- Mark Clements (HappyDog).
Andreas Jonsson | 9 Sep 2010 10:17
Picon

Parsing of image links

The syntax of image links with caption is seriously flawed, but I
think that I have found a reasonable solution for handling them: parse
them as "inline blocks".

To make an inline block out of the image link with caption, we first
let it have its own block context in the lexer, in order to guarantee
nexting order of internal block elements.  This means that the end
token cannot appear in the wrong block context:

   [[File:example.jpg|<table><td> this ]] is not an end token
   for the image link</table> but this ]] is

I have already discussed the image links in the context of speculative
execution in the lexer, to guarantee that any opened image link will
be followed by an image link closing token.  The max nesting level for
links is limited to 2 to avoid pathological speculations.

In the parser, inline blocks may appear in inlined text lines.  They
will break the inlined text line from the point of view of handling
apostrophe parsing, however.  Since block elements may appear in the
image caption, they cannot be part of the lookahead that is performed
for scanning for apostrophes.  This means that in this example:

   text '' italic [[File:example.jpg| text ]] foo '' bar

the text "text '' italic" and the text " foo '' bar" are processed
separately when it comes to apostrophe parsing and the result will be:

<p>text <i> italic</i><a ...><img ..></a>foo <i> bar </i></p>

Which is different from the current parser, where we have:

<p>text <i> italic<a ...><img ..></a>foo </i> bar</p>

However, the behavior will be the same regardless of new lines in the
caption:

   text '' italic [[File:example.jpg| text
   text ]] foo '' bar

still:

<p>text <i> italic</i><a ...><img ..></a>foo <i> bar </i></p>

The original parser have problems:

<p>text <i> italic<a ...><img ..></a>foo  bar </i></i></p>

(My guess is that it first renders the </i> inside of the alt
attribute, which is cleaned up in the attribute sanitizing, and then
it discovers that there is a missing </i> and adds that in.)

In the original parser, wikitext list elements cannot appear in image
captions.  It would, of course, be very easy to just disable the
wikitext list tokens in the lexer to provide the same behavior, but
this seems a bit inconsistent as any other block element may appear in
the caption.  If we instead, in the parser, push/pop the current list
context to a stack when entering/leaving an "inlined block", we can
support lists inside the caption with expected behavior in this case:

* list [[File:example.jpg|
* list item in image caption ]]
* continuing outer list

It is up to the listener to decide what to do with the link caption.
Since it is fully parsed the listening application must be prepared
for this.  In html output, the caption is rendered inside an 'alt'
text, unless there is a 'frame' or 'thumb' option and no explicit
'alt' option (in which case the caption is completely ignored).  So
the listener should have the ability to toggle rendering of markup on
and off in order to render the caption inside the alt attribute.

/Andreas
Andreas Jonsson | 22 Sep 2010 19:41
Picon

Test site for libmwparser

Hi,

I have set up a site for testing my parser implementation:

http://libmwparser.kreablo.se/index.php/Libmwparsertest

Please go ahead and edit.

I have disabled most of the preprocessing, as it seems very hard to
lift out the independent preprocessing from the parser preparation
stuff.  But it should be easy to write a new one with only the
required functionality (which is parser functions, magic words,
comment removal, and transclusion).

It would still take a lot of work to make a version that could be
substituted in place of the current parser with support for all
features.  But its a solid proof of concept.

Best regards,

Andreas Jonsson
Jan Paul Posma | 22 Sep 2010 20:42
Picon
Gravatar

Re: Test site for libmwparser

How awesome! Is this code already available? Maybe it's a good idea to write to wikitech-l once you publish
the source code, because I think most people don't follow this list. This list has been created to keep away
parser discussions from most developers who don't care about that, but a milestone like this should be
shared there too.

Again, very nice, can't wait to be able to install this myself. ;-)

Regards,
Jan Paul

On 22-Sep-2010, at 19:41, Andreas Jonsson wrote:

> Hi,
> 
> I have set up a site for testing my parser implementation:
> 
> http://libmwparser.kreablo.se/index.php/Libmwparsertest
> 
> Please go ahead and edit.
> 
> I have disabled most of the preprocessing, as it seems very hard to
> lift out the independent preprocessing from the parser preparation
> stuff.  But it should be easy to write a new one with only the
> required functionality (which is parser functions, magic words,
> comment removal, and transclusion).
> 
> It would still take a lot of work to make a version that could be
> substituted in place of the current parser with support for all
> features.  But its a solid proof of concept.
> 
> Best regards,
> 
> Andreas Jonsson
> 
> 
> _______________________________________________
> Wikitext-l mailing list
> Wikitext-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Gmane