Bruno Haible | 1 Apr 22:35 2003

addition: binary-io.h

A new module which allows to set file descriptors to binary mode on
platforms where this makes a difference. Mostly taken from coreutils.

OK to add?

Bruno

========================== binary-io.h ==========================
/* Binary mode I/O.
   Copyright (C) 2001 Free Software Foundation, Inc.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2, or (at your option)
   any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software Foundation,
   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */

#ifndef _BINARY_H
#define _BINARY_H

#include <fcntl.h>
/* For systems that distinguish between text and binary I/O.
(Continue reading)

Jim Meyering | 2 Apr 10:24 2003
Picon

Re: addition: binary-io.h

Bruno Haible <bruno <at> clisp.org> wrote:
> A new module which allows to set file descriptors to binary mode on
> platforms where this makes a difference. Mostly taken from coreutils.
>
> OK to add?

Hi Bruno,

Thank you.  This looks good.
A couple of formatting nits:
-----------------------

Please omit the redundant parentheses here:

  # if !(defined(__EMX__) || defined(__DJGPP__))

this style is more consistent:

  # if !(defined __EMX__ || defined __DJGPP__)

-----------------------
Please put a space before an opening parenthesis
and after each comma on the RHS of a #define:

  #  define SET_BINARY(fd) (!isatty(fd) ? (setmode(fd,O_BINARY), 0) : 0)
  # else
  #  define SET_BINARY(fd) setmode(fd,O_BINARY)

i.e.

(Continue reading)

Bruno Haible | 2 Apr 12:55 2003

Re: addition: binary-io.h

Jim Meyering writes:
> Thank you.  This looks good.

OK. I'm putting it in.

> A couple of formatting nits:

Oh, you mean that the GNU coding standards also apply to preprocessor
directives? :-)

Bruno
Bruno Haible | 2 Apr 13:08 2003

addition: utf8-ucs4.h, ucs4-utf8.h, utf16-ucs4.h, ucs4-utf16.h

Hi,

Here come 4 new modules, for converting between UTF-8 / UTF-16 and
UCS-4. I've needed these conversions sufficiently often in gettext:

    utf8-ucs4.h   4 times
    ucs4-utf8.h   2 times
    utf16-ucs4.h  2 times

The functions convert just one character. It is simple to convert an
entire string by calling the functions in a loop. For speed, the case
of ASCII characters is optimized through an inline function.

Any objections?

Bruno

========================= utf8-ucs4.h =========================
/* Conversion UTF-8 to UCS-4.
   Copyright (C) 2001-2002 Free Software Foundation, Inc.
   Written by Bruno Haible <haible <at> clisp.cons.org>, 2001.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
(Continue reading)

Jim Meyering | 2 Apr 15:20 2003
Picon

Re: addition: binary-io.h

>> A couple of formatting nits:
>
> Oh, you mean that the GNU coding standards also apply to preprocessor
> directives? :-)

:-)

It's too bad that GNU indent can't help make
the coding style used in the RHS of a #define
be consistent with the style used everywhere else.
Bruno Haible | 3 Apr 21:05 2003

addition: linebreak.h, linebreak.c

Hi,

I propose to add the internationalized line breaking engine from GNU
gettext to gnulib. Unlike the "break at spaces" heuristic which
doesn't work for Chinese text, this code - an implementation of the
Unicode line breaking algorithm - works for most languages of the
world. (Still a south-east Asian language is not supported for which
I'd need a free dictionary.)

I want to put it into gnulib in the hope that it will be used e.g. as
an alternative algorithm (option -U, --unicode) of 'fmt' in coreutils.

The file defines the functions in 3 variants, for UTF-8, UTF-16 and
UCS-4 strings; I usually comment out two of them through "#if 0"
in order to reduce the size of the generated executables.
Furthermore a fourth variant, which works on multibyte strings
(implemented via iconv on top of the UTF-8 code). It is the latter
which most programs use.

Speak up if you have objections!

Bruno

========================== linebreak.h ==========================
/* linebreak.h - line breaking of Unicode strings
   Copyright (C) 2001-2002 Free Software Foundation, Inc.
   Written by Bruno Haible <haible <at> clisp.cons.org>, 2001.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
(Continue reading)

Jim Meyering | 4 Apr 15:08 2003
Picon

Re: addition: linebreak.h, linebreak.c

Bruno Haible <bruno <at> clisp.org> wrote:
> I propose to add the internationalized line breaking engine from GNU
> gettext to gnulib. Unlike the "break at spaces" heuristic which
> doesn't work for Chinese text, this code - an implementation of the
> Unicode line breaking algorithm - works for most languages of the
> world. (Still a south-east Asian language is not supported for which
> I'd need a free dictionary.)
>
> I want to put it into gnulib in the hope that it will be used e.g. as
> an alternative algorithm (option -U, --unicode) of 'fmt' in coreutils.

Sounds like a fine idea.
I prefer to avoid adding short-named options, unless it's to
provide compatibility with some existing implementation.

> The file defines the functions in 3 variants, for UTF-8, UTF-16 and
> UCS-4 strings; I usually comment out two of them through "#if 0"
> in order to reduce the size of the generated executables.
> Furthermore a fourth variant, which works on multibyte strings
> (implemented via iconv on top of the UTF-8 code). It is the latter
> which most programs use.
>
> Speak up if you have objections!

No objection, but some of those lines are too long ;-)
Would you please ensure that the max line length doesn't exceed 80?

> extern void u8_possible_linebreaks (const unsigned char *s, size_t n, const char *encoding, char *p);
...
> extern int u8_width_linebreaks (const unsigned char *s, size_t n, int width, int start_column, int
(Continue reading)

Paul Eggert | 5 Apr 10:17 2003

Re: addition: linebreak.h, linebreak.c

Bruno Haible <bruno <at> clisp.org> writes:

> The file defines the functions in 3 variants, for UTF-8, UTF-16 and
> UCS-4 strings; I usually comment out two of them through "#if 0"
> in order to reduce the size of the generated executables.

> Furthermore a fourth variant, which works on multibyte strings
> (implemented via iconv on top of the UTF-8 code). It is the latter
> which most programs use.

> /* Determine number of column positions required for UC. */
> extern int uc_width (unsigned int uc, const char *encoding);

Is UC a Unicode code position?  Perhaps it should be typedefed, both for
clarity and for efficiency on weird hosts?  E.g.:

  typedef int_fast32_t unicode_char;
  int uc_width (unicode_char uc, const char *encoding);

Won't it be faster if we add an extra function that converts ENCODING
to a small integer or a pointer that represents the encoding, and pass
that small integer or pointer to uc_width instead of passing ENCODING?

> /* Determine number of column positions required for first N units
>    (or fewer if S ends before this) in S.  */
> extern int u8_width (const unsigned char *s, size_t n, const char *encoding);
> extern int u16_width (const unsigned short *s, size_t n, const char *encoding);
> extern int u32_width (const unsigned int *s, size_t n, const char *encoding);

I was confused by the prefixes u8, u16, and u32.  At first I thought
(Continue reading)

Paul Eggert | 5 Apr 10:19 2003

Re: addition: linebreak.h, linebreak.c

Jim Meyering <jim <at> meyering.net> writes:

> No objection, but some of those lines are too long ;-)
> Would you please ensure that the max line length doesn't exceed 80?
> 
> > extern void u8_possible_linebreaks (const unsigned char *s, size_t n, const char *encoding, char *p);

I generally omit the "extern " in declarations like that.
"extern " adds nothing, and omitting it saves 7 columns.
Bruno Haible | 7 Apr 13:17 2003

Re: addition: linebreak.h, linebreak.c

Paul Eggert writes:
> I generally omit the "extern " in declarations like that.
> "extern " adds nothing, and omitting it saves 7 columns.

I generally add the "extern " in declarations like that, for three
reasons:

* It makes it clear to the human reader that he should not expect the
  implementation of the function in the same file.

* It makes declarations of variables and functions more consistent.

* It makes it easy to preprocess the file for use in Microsoft DLLs:
  sed -e 's/extern /extern LIBXXX_DLL_EXPORTED /'

Bruno

Gmane