Re: Automatic text direction (fwd)
Behdad Esfahbod <behdad <at> cs.toronto.edu>
2005-03-18 23:39:59 GMT
---------- Forwarded message ----------
Date: Thu, 17 Mar 2005 19:26:39 -0500 (EST)
From: Behdad Esfahbod <behdad <at> cs.toronto.edu>
To: E L <nakeee <at> gmail.com>
Cc: bidi <at> freedekstop.org
Subject: Re: [Bidi] Automatic text direction
[Ely, seems like you forgot CCing to list. Doing that in reply.]
On Mon, 14 Mar 2005, E L wrote:
> > * Unicode Bidirectional Algorithm (UBA from now on on this list),
> > for paragraphs of plain text. This is already implemented in
> > GNU FriBidi and used by Pango, in the GNOME stack, for example.
> Does kde use the same algo?openoffice?java?
Yes, they all do that. UBA is the only piece of the game
accepted by everybody, since it's a part of the Unicode standard.
But well, the Unicode Character Database used in different stacks
is pretty different I guess.
> > * Some low-level markup is defined for better bidi override, etc,
> > like the unicode-bidi property in CSS for example. In the GNOME
> > stack this goes inside Pango. We don't have it yet. Got to
> > design. Dov brought the issue up some time ago on this list
> > IIRC. This should solve the GAIM problem above.
> It's not only markups which are the problem, it's also in gaim or irc where
> your nick is in english and you talk hebrew or arabic.
The gaim and irc messages are not plain text again. When we have
the bidi markup defined and implemented, in Pango say, then gaim
simply surrounds the nick by some markup saying this part should
be ignored when determining paragraph direction. It's kinda a
paragraph embedding markup. Dov posted a proposal about that
some time ago, he was proposing addition of such a character to
the Unicode standard, but I believe we should deprecate the
stateful Unicode bidi format characters in user text, and instead
define our bidi markup. The Unicode format characters can be
used in the underlying implementation of the bidi markup though,
hence that's not enouhg and special handling (like skip this part
when determining paragraph direction) is needed.
> > * The algorithm implemented by Dov in GNOME, with some
> > refinements, for determining direction of paragraphs in a plain
> > text document. This works pretty good most of them. Some
> > people say it makes them less productive, but almost all
> > examples they give is of markup languages, not plain text.
> almost but I just gave 2 which are not :)
If we have a makrup for bidi formatting, it's all a matter of
adding markup in the right applications. Quite like what you can
do in HTML/CSS, but more powerful.
> There are 2 ways to do it one would be manual override which should
> be done anyhow cause you can't cache all cases no matter what.
> The other would be to provide ability to ignore the first few words
> like check from the second word on or so.
> > * Seems like we need a layer here to allow manually override
> > paragraph directions, etc...
Ok, do you think we should start an fd.o spec stub and go on by
filling in, seems like we have a big picture and lots of ideas.
Or we still need to go on the list and perhaps gather consensus
on Wiki before doing that?
> > * For non-plain text, like markup languages (XML, HTML, ...), and
> > standardize the data that a highlighting engine can use for
> > proper rendering. See my comment in this feature request on
> > gtksourceview for details:
> > http://bugzilla.gnome.org/show_bug.cgi?id=168108
> Sounds good:)
> > I believe if we get to implement these all, we're almost there.
> > Remains exotic things like LaTeX source, etc, that cannot be
> > reaaly handled anyway in an editor.
> ARRMMM:) I wouldn't say that, there are more than a few projects on making
> latex have good bidi support, Im sure we can figure something out.
What I've ment is that you cannot easily determine paragraph
direction from the LaTeX source, unless you are LaTeX itself.
But for specific bidi LaTeX implementations (like ivritex, or
ArabTeX, or FarsiTeX, or Fanoos, ...), sure, the same
highlighting technique applies to some level.