Jonathan Kew | 2 Apr 2010 11:53

mirroring support in hb_shape

Hi Behdad,

I think there's an issue with how (or rather *where*) mirroring is implemented -- currently
hb_mirror_chars is called from hb_substitute_default, right before hb_map_glyphs. The problem with
this is that by the time hb_substitute_default is called, the buffer has been forced to "native"
direction, but AFAICS we need the mirroring to be based on the original direction instead.

This causes problems if we have text such as <RLO>Hello (world)<PDF>, which will form a RTL run;
hb_ensure_native_direction will reverse it, in order that OT lookups can work properly, but this will
prevent the parens being mirrored because hb_mirror_chars will see the run as being LTR.

To resolve this, I've moved the call to hb_mirror_chars out of hb_substitute_default, and put it before
hb_ensure_native_direction, as shown below. Please see if you think this is a reasonable thing to do.

JK

diff --git a/src/hb-shape.c b/src/hb-shape.c
--- a/src/hb-shape.c
+++ b/src/hb-shape.c
 <at>  <at>  -111,17 +111,16  <at>  <at>  hb_map_glyphs (hb_font_t    *font,

 static void
 hb_substitute_default (hb_font_t    *font,
 		       hb_face_t    *face,
 		       hb_buffer_t  *buffer,
 		       hb_feature_t *features,
 		       unsigned int  num_features)
 {
-  hb_mirror_chars (buffer);
   hb_map_glyphs (font, face, buffer);
(Continue reading)

Behdad Esfahbod | 2 Apr 2010 18:14
Favicon
Gravatar

Re: mirroring support in hb_shape

On 04/02/2010 05:53 AM, Jonathan Kew wrote:
> Hi Behdad,
> 
> I think there's an issue with how (or rather *where*) mirroring is implemented -- currently
hb_mirror_chars is called from hb_substitute_default, right before hb_map_glyphs. The problem with
this is that by the time hb_substitute_default is called, the buffer has been forced to "native"
direction, but AFAICS we need the mirroring to be based on the original direction instead.
> 
> This causes problems if we have text such as <RLO>Hello (world)<PDF>, which will form a RTL run;
hb_ensure_native_direction will reverse it, in order that OT lookups can work properly, but this will
prevent the parens being mirrored because hb_mirror_chars will see the run as being LTR.
> 
> To resolve this, I've moved the call to hb_mirror_chars out of hb_substitute_default, and put it before
hb_ensure_native_direction, as shown below. Please see if you think this is a reasonable thing to do.

You are indeed correct.  Very good point.  Will keep that in mind when adding
rtlm/ltrm features.

behdad

> JK
> 
Behdad Esfahbod | 9 Apr 2010 19:29
Favicon
Gravatar

HarfBuzz hacking sprint in planning

Hi,

Martin Hosken and I have been thinking about doing a HarfBuzz hacking sprint.
 Current time/location we have in mind is week of May 17th in Reading, UK.

We are deliberately trying to keep this a small hacking sprint as opposed to a
meeting.  It would be really nice to have Jonathan Kew, Evin Martin, and
Daniel Glassy.  How does your schedules look like for that week?

We'd got to decide fast since I need to go figure my visa out.  Martin kindly
offered to sort a venue out in the local church, or we can try the university.

Anyone else who is nearby / we should definitely invite over?

Cheers,
behdad
Jonathan Kew | 12 Apr 2010 23:43

Re: HarfBuzz hacking sprint in planning

On 9 Apr 2010, at 18:29, Behdad Esfahbod wrote:

> Hi,
> 
> Martin Hosken and I have been thinking about doing a HarfBuzz hacking sprint.
> Current time/location we have in mind is week of May 17th in Reading, UK.
> 
> We are deliberately trying to keep this a small hacking sprint as opposed to a
> meeting.  It would be really nice to have Jonathan Kew, Evin Martin, and
> Daniel Glassy.  How does your schedules look like for that week?

That could work for me. I've put it (tentatively) on the calendar.

JK

> 
> We'd got to decide fast since I need to go figure my visa out.  Martin kindly
> offered to sort a venue out in the local church, or we can try the university.
> 
> Anyone else who is nearby / we should definitely invite over?
> 
> Cheers,
> behdad
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz@...
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Lars Knoll | 14 Apr 2010 17:14

harfbuzz: Branch 'master'

 src/harfbuzz-indic.cpp        |   23 ++++++++++++++++-
 src/harfbuzz-shaper-private.h |   55 ++++++++++++++++++++++--------------------
 tests/shaping/main.cpp        |    2 +
 3 files changed, 52 insertions(+), 28 deletions(-)

New commits:
commit c638ddc70f6a8196f2c8b11808ab01510233a0ee
Author: Lars Knoll <lars.knoll@...>
Date:   Wed Apr 14 17:01:49 2010 +0200

    Fix a bug in malayalam shaping

    See http://bugreports.qt.nokia.com/browse/QTBUG-1887.
    We were not finding the base character correctly in the case
    where the syllable contained a ZWJ.

    In addition, the indic OT specs require us to also apply the 'loca',
    'cjct' and 'calt' features. They seem to be mostly unused by todays
    fonts, but we should better apply them anyways.

diff --git a/src/harfbuzz-indic.cpp b/src/harfbuzz-indic.cpp
index 3c9df93..4d8418b 100644
--- a/src/harfbuzz-indic.cpp
+++ b/src/harfbuzz-indic.cpp
 <at>  <at>  -1107,6 +1107,7  <at>  <at>  static inline void splitMatra(unsigned short *reordered, int matra, int &len)

 #ifndef NO_OPENTYPE
 static const HB_OpenTypeFeature indic_features[] = {
+    { HB_MAKE_TAG('l', 'o', 'c', 'a'), LocaProperty },
     { HB_MAKE_TAG('c', 'c', 'm', 'p'), CcmpProperty },
(Continue reading)

John Daggett | 15 Apr 2010 08:16

harfbuzz review

As part of ongoing work at Mozilla I was asked to review the new
harfbuzz-ng code for inclusion in Firefox. I realize there are other
projects with differing constraints involved but I'm concerned about
the structure of the current code. Given the relative complexity of
OpenType layout and the structure of the GSUB/GPOS/GDEF tables,
OpenType shaping code is bound to be somewhat complex.  But I think
the structure of Harfbuzz code currently is much, much too complex.

One part of this complexity is the mishmash of language conventions
used, the API is C but the code is a limited subset of C++.  On the
mailing list, there seems to be confusion about this:

http://lists.freedesktop.org/archives/harfbuzz/2009-November/000460.html

> On Fri Nov 6 07:14:27 PST 2009, Adam Langley agl at google.com wrote:
> On Thu, Nov 5, 2009 at 11:46 PM, Martin Hosken <martin_hosken at sil.org> wrote:
>> I'm about to try adding graphite support as a contrib to
>> harfbuzz-ng. Graphite itself is in C++ and so the linking code is
>> going to have to bridge between C and C++. But I notice that you go
>> to great lengths to ensure that harfbuzz doesn't link to libstdc++.
>> I am wondering why this is the case.
> 
> Harfbuzz is a C library. It would be very sad if it needed to pull in
> libstdc++. That would pollute Pango, and then the rest of the
> GNOME/GTK world.
> 
> AGL

Despite this, some of the code in Harfbuzz is clearly using C++ but in
a way so as to avoid pulling in libstd++.  There's a big mix of
(Continue reading)

John Daggett | 15 Apr 2010 08:20

Re: harfbuzz review

Based on feedback from Behdad, I made a few additional comments, included below.

Original post here:
https://bugzilla.mozilla.org/show_bug.cgi?id=449292#c55

> It's definitely more complex than the code it's replacing (which was
> plain C). However, it's very carefully designed, easier to audit,
> and I'm much more confident in it than in the code it replaces.

Complex code for dealing with complex data is one thing but the
Harfbuzz code makes even simple things complicated in some cases. Take
as an example some of the struct's defined in hb-open-type-private.hh,
there's a whole set of *structs* defined for basic data types.  There
are far, far simpler ways of dealing with endian data than this.  What
advantage does this complexity add?  Does it somehow simplify the code?

============== from hb-open-type-private.hh ==============

#define _DEFINE_INT_TYPE1_UNALIGNED(NAME, TYPE, BIG_ENDIAN, BYTES) \
  struct NAME \
  { \
    inline NAME& set (TYPE i) { (TYPE&) v = BIG_ENDIAN (i); return *this; } \
    inline operator TYPE(void) const { return BIG_ENDIAN ((TYPE&) v); } \
    inline bool operator== (NAME o) const { return (TYPE&) v == (TYPE&) o.v; }
\
    inline bool sanitize (SANITIZE_ARG_DEF) { \
      TRACE_SANITIZE (); \
      return SANITIZE_SELF (); \
    } \
    private: unsigned char v[BYTES]; \
(Continue reading)

Behdad Esfahbod | 15 Apr 2010 20:28
Favicon
Gravatar

Re: harfbuzz review

On 04/15/2010 02:20 AM, John Daggett wrote:

>> It's definitely more complex than the code it's replacing (which was
>> plain C). However, it's very carefully designed, easier to audit,
>> and I'm much more confident in it than in the code it replaces.
> 
> Complex code for dealing with complex data is one thing but the
> Harfbuzz code makes even simple things complicated in some cases. Take
> as an example some of the struct's defined in hb-open-type-private.hh,
> there's a whole set of *structs* defined for basic data types.  There
> are far, far simpler ways of dealing with endian data than this.  What
> advantage does this complexity add?  Does it somehow simplify the code?

I'm all ears to hear about those "far, far simpler ways of dealing with endian
data".  What this code does is to obviate the need to call a bytecode
conversion function/macro/... in thousands of places, each of which could be
omitted by mistake.  So, review the ten lines and you're confident that there
is no endian bugs in that 5kloc.

Other than endianness, this chunk of code also defines int types that have no
alignment requirements.

The use of a macro is to avoid copy/paste'ing the those ten lines 4 times (for
USHORT, SHORT, ULONG, LONG).

Of the chunk you pasted, one is an optimized version of the other one, and has
a TODO item before it that you didn't quote:

/* TODO On machines that allow unaligned access, use this version. */
#define _DEFINE_INT_TYPE1_UNALIGNED(NAME, TYPE, BIG_ENDIAN, BYTES) \
(Continue reading)

Behdad Esfahbod | 15 Apr 2010 21:54

harfbuzz-ng: Branch 'master'

 src/hb-open-type-private.hh |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

New commits:
commit 9b39755d104603d1163738f77637cc1923d4055b
Author: Behdad Esfahbod <behdad@...>
Date:   Thu Apr 15 14:00:25 2010 -0400

    Typo

diff --git a/src/hb-open-type-private.hh b/src/hb-open-type-private.hh
index 169b04b..6dc3e30 100644
--- a/src/hb-open-type-private.hh
+++ b/src/hb-open-type-private.hh
 <at>  <at>  -398,8 +398,8  <at>  <at>  struct Tag : ULONG
   inline bool sanitize (SANITIZE_ARG_DEF) {
     TRACE_SANITIZE ();
     /* Note: Only accept ASCII-visible tags (mind DEL)
-     * This is one of the few times (only time?) we check
-     * for data integrity, as opposed o just boundary checks
+     * This is one of the few places (only place?) that we check
+     * for data integrity, as opposed to just boundary checks.
      */
     return SANITIZE_SELF () && (((uint32_t) *this) & 0x80808080) == 0;
   }
John Daggett | 16 Apr 2010 08:19

Re: harfbuzz review

Behdad Esfahbod wrote:

> >> You seem to drastically under-estimate the headaches of
> >> lifecycle-management in a library.  Mozilla also disables cairo's
> >> mutex use.  So I don't think Mozilla is the perfect use case of
> >> those features.  When you are dealing with FT_Face and hb_face_t
> >> objects passed around between cairo, pango, poppler, Python
> >> wrappers, etc, the resource management API is absolutely
> >> necessary IMO.
> >
> > I'm not underestimating it, I'm simply arguing that the code for
> > doing that resource management does not belong in Harfbuzz.
> 
> "resource management" != object lifecycle management.

I'm not arguing that resource management and/or object lifecycle
management aren't needed, I'm simply saying that code for doing these
functions belongs outside the core shaping code and is better-handled
via an API that apps can tune to fit their needs.

> Allowing the user code to cache the sanitized tables is planned,
> just not provided right now.

Ok, great.

> You ignored part of my reply that I said I have plans to add API to
> decouple sanitizing from layout.  In my design, when that API is
> added, you get the functionality you are asking for.  Remains your
> request that the sanitize code is put in a separate module.  I don't
> see any benefit in doing that.
(Continue reading)


Gmane