Eli Zaretskii | 9 Oct 2009 23:18
Picon

Bidirectional editing in Emacs -- main design decisions

As some of you know, I'm slowly working on adding support for
bidirectional editing in Emacs.  (Before you ask: the code is not
publicly available yet, and won't be until Emacs switches to bzr as
its main VCS.)

While there's a lot of turf to be covered yet, I thought I'd publish
the main design decisions up to this point.  Many of these decisions
were discussed at length years ago on emacs-bidi mailing list, and
since then I also talked them over in private email with a few people.
Other decisions were made recently, as I went about changing the
display engine.

My goal, and the main drive behind these design decisions was to
preserve as much as possible the basic assumptions and design
principles of the current Emacs display engine.  This is not just
opportunism; I firmly believe that any other way would mean a total
redesign and rewrite of the display engine, which is something we want
to avoid.  Personally, if such a redesign would be necessary, I
couldn't have participated in that endeavor, except as advisor.

With that preamble out of my way, here's what I can tell about the
subject at this point:

1. Text storage

   Bidirectional text in Emacs buffers and strings is stored in strict
   logical order (a.k.a. "reading order").  This is how most (if not
   all) other implementations handle bidirectional text.  The
   advantage of this is that file and process I/O is trivial, as well
   as text search.  The disadvantage is that text needs to be
(Continue reading)

Eli Zaretskii | 10 Oct 2009 00:29
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> From: joakim <at> verona.se
> Cc: emacs-devel <at> gnu.org, emacs-bidi <at> gnu.org
> Date: Fri, 09 Oct 2009 23:55:19 +0200
> 
> It works mostly the same as embedding images. From what youre
> writing below it sounds like the display of images will work as
> before, therefore my patch will apply, hopefully nicely, on top of
> bidi. Correct?

Correct.  Images and any other objects will be reordered according to
UAX#9, and as a single entity.  IOW, imagine that instead of the
embedded widget the buffer has a single character U+FFFC (OBJECT
REPLACEMENT CHARACTER).  The reordering code will treat the embedded
widget as it would treat that character.  That means, in particular,
that if the widget is embedded in text written in some right-to-left
script, the text that precedes the widget will be on the right of the
widget, and text that follows it will be on the left.
Eli Zaretskii | 10 Oct 2009 00:41
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> Date: Fri, 09 Oct 2009 23:18:00 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 
> 
>    So I decided to use such a higher protocol -- namely,
>    the Emacs definition of a paragraph, as determined by the
>    `paragraph-start' and `paragraph-separate' regexps.

A small, but significant correction to this: these two regexps are
looked for anchored at line beginning.

The reason for this deliberate deviation from the letter of Emacs
definition of a paragraph are complicated, but the upshot is that from
the user point of view, it does not make sense to change paragraph
direction if the paragraph separator does not begin at the beginning
of a line.

As another deviation from the definition of a paragraph, text that
matches `paragraph-separate' is given the same direction as the
preceding paragraph.  (By contrast, Emacs generally does not consider
`paragraph-separate' as part of any paragraph.)
Eli Zaretskii | 10 Oct 2009 09:08
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> From: joakim <at> verona.se
> Cc: emacs-devel <at> gnu.org, emacs-bidi <at> gnu.org
> Date: Sat, 10 Oct 2009 00:42:39 +0200
> 
> Presumably I will also need to tell the widget to render its own text in
> bidi mode. 

By itself, or by using Emacs facilities?
Eli Zaretskii | 10 Oct 2009 10:20
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> From: joakim <at> verona.se
> Cc: emacs-devel <at> gnu.org, emacs-bidi <at> gnu.org
> Date: Sat, 10 Oct 2009 09:28:19 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> From: joakim <at> verona.se
> >> Cc: emacs-devel <at> gnu.org, emacs-bidi <at> gnu.org
> >> Date: Sat, 10 Oct 2009 00:42:39 +0200
> >> 
> >> Presumably I will also need to tell the widget to render its own text in
> >> bidi mode. 
> >
> > By itself, or by using Emacs facilities?
> 
> They are gtk widgets. I havent looked at it closely, but I presume you
> tell the gtk widgets which locale to render text in.

Actually, I'd expect gtk widgets to do this automatically, no matter
in what locale.  The text it renders should supply the hint.

> So, by itself, yes.

You can do that, Emacs won't care.
Eli Zaretskii | 10 Oct 2009 16:06
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> From: Sascha Wilde <wilde <at> sha-bang.de>
> Cc: emacs-devel <at> gnu.org,  emacs-bidi <at> gnu.org
> Date: Sat, 10 Oct 2009 15:44:19 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> wrote:
> > 8. User control of visual order
> [...]
> >    Emacs could
> >    have a command called, say, `make-paragraph-left-to-right' that did
> >    its job simply by inserting LRM at the beginning of the paragraph.
> 
> I would suggest that Emacs should also have a way to visualize the
> otherwise invisible text direction marks so that:
> 
> - it becomes transparent to the user whether the direction of an
>   specific portion of the text is explicit or implicit defined
> 
> - the user is provided with a simple way to remove the marks in case he
>   wants to.  (By simply deleting them the way he would do with any other
>   regular character.)

Yes.  That's what this part of my longish message was trying to say:

   In addition, being able to show these formatting codes to the user
   is a valuable feature, because the way reordered text looks might
   not be otherwise understood or changed easily.

The "direction marks" you mention are the "formatting codes" I wrote
about.
(Continue reading)

Ehud Karni | 10 Oct 2009 16:57
Picon

Re: Bidirectional editing in Emacs -- main design decisions

On Fri, 09 Oct 2009 23:18:00 Eli Zaretskii wrote:
>
> Here's what I can tell about the subject (bidi display) at this point

In general I agree with your decisions.

> 1. Text storage
>
>    Bidirectional text in Emacs buffers and strings is stored in strict
>    logical order (a.k.a. "reading order").  This is how most (if not
>    all) other implementations handle bidirectional text.  The
>    advantage of this is that file and process I/O is trivial, as well
>    as text search.  [snip]

The search has many problems but this should not influence your bidi
reordering. The changes to various search functions can be done later.

The user ALWAYS search for the visual text s/he sees (S/he never knows
the logical order unless she visits the file literally).

The problems are caused by many reasons:
  1. Different logical inputs, even without formatting characters, can
     result in the same visual output.
     e.g. Logical Hebrew text + a number in LTR reading order, the
     number may be before or after the Hebrew text, but in the visual
     output the number will always be after (to the left of) the text.
     Logical "123 HEBREW 456" appears as "123 456 WERBEH".
  2. Formatting characters are not seen and should not be searched.
  3. The visual appearance of the searched string may be different from
     what it will match.  e.g. The search for logical "HEBREW 3." in
(Continue reading)

Eli Zaretskii | 10 Oct 2009 18:38
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> Date: Sat, 10 Oct 2009 16:57:59 +0200
> From: "Ehud Karni" <ehud <at> unix.mvs.co.il>
> Cc: emacs-bidi <at> gnu.org, emacs-devel <at> gnu.org
> 
> On Fri, 09 Oct 2009 23:18:00 Eli Zaretskii wrote:
> >
> > Here's what I can tell about the subject (bidi display) at this point
> 
> In general I agree with your decisions.

Well, you brought up many of them (thanks!), so it isn't surprising ;-)

> The search has many problems but this should not influence your bidi
> reordering. The changes to various search functions can be done later.

Agreed.

> The user ALWAYS search for the visual text s/he sees (S/he never knows
> the logical order unless she visits the file literally).

She will look for visual text, but she will type the text she looks
for in the logical (reading) order, not in the visual order, where
characters are reversed and/or reshuffled.

> The problems are caused by many reasons:
>   1. Different logical inputs, even without formatting characters, can
>      result in the same visual output.
>      e.g. Logical Hebrew text + a number in LTR reading order, the
>      number may be before or after the Hebrew text, but in the visual
>      output the number will always be after (to the left of) the text.
(Continue reading)

James Cloos | 10 Oct 2009 19:18
Face
Favicon
Gravatar

Re: Bidirectional editing in Emacs -- main design decisions

>>>>> "Eli" == Eli Zaretskii <eliz <at> gnu.org> writes:

Eli> I'm slowly working on adding support for bidirectional editing in Emacs.

Thanks for posting that.  It is a great summary of the concerns and
needs of an editor when dealing with bidi test.

To be fair, I should point out before continuing that I do not read any
rtl scripts.  My interests deal with fonts and typography and at least
seeing bidi email in its correct visual order, if only to try to learn
some of it.

Eli> 1. Text storage
Eli> 2. Support for Unicode Bidirectional Algorithm
Eli> 3. Bidi formatting codes are retained
Eli> 4. Reordering of text for display
Eli> 5. Visual-order information is volatile
Eli> 6. Reordering of strings from `display' properties
Eli> 7. Paragraph base direction
Eli> 8. User control of visual order

Of those points, all but #6 are no brainers; your choices are exactly
what an editor must do.

Point six is an interesting problem; I'm also unaware of any prior
art.  I suspect that in the long term it would be best to note the
start and end directionality of such chunks of text and set them
chunk-by-chunk in a manner similar to how glyphs are set in the
absence of such properties.  But in the short term I agree with
the choice you outlined.
(Continue reading)

Eli Zaretskii | 10 Oct 2009 20:33
Picon

Re: Bidirectional editing in Emacs -- main design decisions

> From: James Cloos <cloos <at> jhcloos.com>
> Cc: emacs-devel <at> gnu.org,  emacs-bidi <at> gnu.org
> 
> Thanks for posting that.  It is a great summary of the concerns and
> needs of an editor when dealing with bidi test.

Thanks, but I think it's just the beginning.  There are lots of other
issues to deal with; see, for example, the aspects of search described
by Ehud Karni in this thread.

The hard problem in making these decisions was to become convinced
that all those other issues are reasonably solvable based on these
basic features, without actually solving any of them.

> Of those points, all but #6 are no brainers; your choices are exactly
> what an editor must do.

Thanks for confirming that.

> Point six is an interesting problem; I'm also unaware of any prior
> art.  I suspect that in the long term it would be best to note the
> start and end directionality of such chunks of text and set them
> chunk-by-chunk in a manner similar to how glyphs are set in the
> absence of such properties.

I think this is impossible in general, because once text is reordered,
the information needed to plug in additional chunks (the resolved
level of each character) is lost.

Note that it is fairly simple to reorder the text of `display' strings
(Continue reading)


Gmane