Jose | 1 Dec 11:11 2008
Picon

Parsing wikitext files

Hi,

I am new to the list. I would like to parse wikitext files from
wikipedia and have some doubts:

1) what is the best parser that gets the best rendering for wikipedia
content ? And the fastest
parser ?

2) If I decide to write my parser, how can I handle rendering of
infoboxes/macros ? Maybe the idea
is to parse the wikitext but handle the macros/infoboxes/templates
with a separate library (as
I imagine there are lots of different templates and it might be silly
to try to parse/render each one
as this might be handled well for existing libraries)

thanks and regards
jose
Platonides | 1 Dec 15:58 2008
Picon

Re: Parsing wikitext files

Jose escribió:
> Hi,
> 
> I am new to the list. I would like to parse wikitext files from
> wikipedia and have some doubts:
> 
> 1) what is the best parser that gets the best rendering for wikipedia
> content ? 

MediaWiki parser.

>And the fastest parser ?

I don't know a parser comparison but you can be really fast if you don't
mind dropping some features.

> 2) If I decide to write my parser, 

That's a bad idea.

> how can I handle rendering of infoboxes/macros ? 

They're called templates. There's a preprocessing step which substitutes
templates. You then parse the result.

> Maybe the idea
> is to parse the wikitext but handle the macros/infoboxes/templates
> with a separate library (as
> I imagine there are lots of different templates and it might be silly
> to try to parse/render each one
(Continue reading)

Jose | 2 Dec 15:26 2008
Picon

Re: Parsing wikitext files

>> 2) If I decide to write my parser,
>
> That's a bad idea.

I just found wikiprep which may be what I want.

Is there any code available to just get quickly the tokens of
wikitext, e.g. to feed to an inverted index ?

thanks

Gmane