Tomasz Wegrzanowski | 1 Dec 2002 02:00
Picon

TeX

Here is first version of TeX rendering extension to Wikipedia.
It's not production code yet.

Please comment.

= How does it work =

New preference is introduced which says whether to:
* always render images as PNGs
* render them as HTML if they are simple enough, or as PNGs otherwise.
* leave them as pseudo-TeX (mainly for text browsers where neither PNG
  nor HTML rendering would be visible)

ISSUE 1: While HTML reduces bandwidth, it is much uglier, so default is PNG-only.
ISSUE 2: PNGs are rendered with "a bit too big" font. That's on purpose.
         A big too big works well in big and medium resolution, and is still readable
	 in small resolution. But "a bit too little" with big resolution would be very
	 hard to read.

Also new table is introduced:
CREATE TABLE math (
    math_inputhash char(32) NOT NULL,
    math_outputhash char(32) NOT NULL,
    math_html text NOT NULL,
    UNIQUE KEY math_inputhash (math_inputhash)
);

math_inputhash is MD5 of input markup, math_outputhash
is MD5 of output markup, math_html is HTML rendering or ""
if it's too difficult for HTML.
(Continue reading)

Tomasz Wegrzanowski | 1 Dec 2002 02:36
Picon

Re: TeX

ISSUE 15: texvc is written in Ocaml

Ocaml is best available language for writing interpreters for special
purpose languages because:
* it is really fast, comparable in performance to "traditional" compiled
  languages with GC like Java, and in cases where GC doesn't introduce
  much overhead, to C/C++
* it doesn't segfault
* it has yacc/lex
* it has advanced symbolic processing functionality
* it can do a lot to ensure correctness of programs and programs in Ocaml
  are very easy to reason about (texvc doesn't contain single variable)
* programs can be written at much higher level, so developement is faster,
  yet low level functionality is also available if needed.

I hope this explanation in enough for you.

To learn more about Ocaml visit Polish Wikipedia, in particular:
http://pl.wikipedia.org/wiki/Ocaml
http://pl.wikipedia.org/wiki/Ocamlyacc

ISSUE 16: this pseudo-TeX is very incomplete. Please tell me what
          functionality do you need (sums/integrals come to mind ...).

ISSUE 17: texvc uses double-dollar math, not single-dollar math. It
	  uses more space but looks nicer.

Jonathan Walther | 1 Dec 2002 08:36
Picon
Favicon

more on namespaces

Namespaces appear to exist so that actual encyclopedia articles can be
distinguished from everything else. Instead of namespaces, could
Wikipedia live with a flag that said "This is an article" for each
article?

Jonathan

--

-- 
                     Geek House Productions, Ltd.

  Providing Unix & Internet Contracting and Consulting,
  QA Testing, Technical Documentation, Systems Design & Implementation,
  General Programming, E-commerce, Web & Mail Services since 1998

Phone:   604-435-1205
Email:   djw@...
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC  V5R2W2
Jens Frank | 1 Dec 2002 08:51
Picon
Picon

Re: more on namespaces

On Sat, Nov 30, 2002 at 11:36:04PM -0800, Jonathan Walther wrote:
> Namespaces appear to exist so that actual encyclopedia articles can be
> distinguished from everything else. Instead of namespaces, could
> Wikipedia live with a flag that said "This is an article" for each
> article?
> 
What is the problem you have with namespaces? Namespaces provide
much more information than a binary article/no article. Page layout
is different for talk than for user pages: User pages have links
like "user contributions". Talk pages have not. And image-pages
bevave very differently to the above. That's why there are namespaces.

JeLuF

Jonathan Walther | 1 Dec 2002 08:52
Picon
Favicon

Re: more on namespaces

On Sun, Dec 01, 2002 at 08:51:48AM +0100, Jens Frank wrote:
>What is the problem you have with namespaces? Namespaces provide
>much more information than a binary article/no article. Page layout
>is different for talk than for user pages: User pages have links
>like "user contributions". Talk pages have not. And image-pages
>bevave very differently to the above. That's why there are namespaces.

Thank you Jens.  Thats the kind of information I was looking for.

Jonathan

--

-- 
                     Geek House Productions, Ltd.

  Providing Unix & Internet Contracting and Consulting,
  QA Testing, Technical Documentation, Systems Design & Implementation,
  General Programming, E-commerce, Web & Mail Services since 1998

Phone:   604-435-1205
Email:   djw@...
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC  V5R2W2
Jens Frank | 1 Dec 2002 09:22
Picon
Picon

Re: TeX

On Sun, Dec 01, 2002 at 02:00:26AM +0100, Tomasz Wegrzanowski wrote:
> Here is first version of TeX rendering extension to Wikipedia.
> It's not production code yet.
> 
> Please comment.

Hello taw,

just looked into your diff. Making HTML-rendering an option is
a good idea.

One thing I would strongly propose to change:

function renderMath( $matches )
{
   ...
       $pid = popen ("./math/texvc \"{$tex}\"", "r"); # texvc shouldn't be in cgi-bin

This allows nasty attacks before the TeX-code is validated. Let, for 
example, $tex be $(find / -type f|xargs rm)
Then popen starts a shell to start the program and its parameters are expanded by
the shell. A lot of nasty things could be performed this way.

Workaround: 
a) use a bi-directional proc_open and put the $tex via stdin 
b) create a file with the md5-hash as filename.

Workaround (a) is currently not available in standard PHP.

Regarding funtions to be provided:
(Continue reading)

Jonathan Walther | 1 Dec 2002 10:14
Picon
Favicon

dealing with deletions

What is the desired behavior?  If someone creates an article that was
previously deleted, should the articles previous history get restored?

--

-- 
                     Geek House Productions, Ltd.

  Providing Unix & Internet Contracting and Consulting,
  QA Testing, Technical Documentation, Systems Design & Implementation,
  General Programming, E-commerce, Web & Mail Services since 1998

Phone:   604-435-1205
Email:   djw@...
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC  V5R2W2
Brion VIBBER | 1 Dec 2002 10:34
Picon
Favicon
Gravatar

Re: dealing with deletions

Jonathan Walther wrote:
> What is the desired behavior?  If someone creates an article that was
> previously deleted, should the articles previous history get restored?

Currently, the previous history is only restored if it is restored from 
the deleted pages archive via Special:Undelete. If a new page was since 
created with the title, the old revisions are simply integrated into the 
existing history (generally at the end -- but if the new page was 
renamed from an older title, it's possible that the histories could 
intermix when presented sorted by date).

Here's a diagram of what exists in what tables over the lifetime of such 
an event:

Page creation:
rev A -> cur

Later edit:
rev B -> cur
rev A -> old

Deletion
rev B -> archive (hidden)
rev A -> archive (hidden)

New creation with same title:
rev C -> cur
rev B -- archive (hidden)
rev A -- archive (hidden)

(Continue reading)

Tomasz Wegrzanowski | 1 Dec 2002 13:18
Picon

Re: TeX

On Sun, Dec 01, 2002 at 09:22:14AM +0100, Jens Frank wrote:
> One thing I would strongly propose to change:
> 
> function renderMath( $matches )
> {
>    ...
>        $pid = popen ("./math/texvc \"{$tex}\"", "r"); # texvc shouldn't be in cgi-bin
> 
> 
> This allows nasty attacks before the TeX-code is validated. Let, for 
> example, $tex be $(find / -type f|xargs rm)
> Then popen starts a shell to start the program and its parameters are expanded by
> the shell. A lot of nasty things could be performed this way.
> 
> Workaround: 
> a) use a bi-directional proc_open and put the $tex via stdin 
> b) create a file with the md5-hash as filename.
> 
> Workaround (a) is currently not available in standard PHP.

PHP has standard function that escapes shell metacharacters,
exactly for this purpose.

I just forgot to put it there.

Axel Boldt | 2 Dec 2002 02:06
Picon
Favicon
Gravatar

Re: TeX

I wonder if it would be beneficial to skip the external parsing step
and leave all parsing to TeX. The advantage would be that all TeX
functionality is immediately available without any additional work,
including macro packages such as those for
* commutative diagrams and graphs (xypic)
* chemical structure formulas (chemtex)
* music scores (musictex)
* chess diagrams

There is no safety issue, since TeX can be made to run safely so that
no shell processes can be called and only the standard files can be
generated.

The only disadvantage I see is that we would lose the optional HTML
rendering of formulas. While I like that feature, I think we can live
without it: most advanced formulas cannot be rendered in HTML anyway,
and those that can should probably be written in HTML in the first
place to be nice to anonymous users with non-graphical browsers.

Axel

---
Payment: http://www.wikipedia.org/wiki/K%F6nig%27s_lemma

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

(Continue reading)


Gmane