Florian Weimer | 1 May 19:25 2000

Re: wcwidth() implementation

  Markus Kuhn <Markus.Kuhn <at> cl.cam.ac.uk> writes:

> Attached is my public domain implementation of the wcwidth() and
> wcswidth() functions. I hope you will find it useful for inclusion into
> glibc, xterm, etc. The function wcwidth() distinguishes between normal,
> wide, and combining characters, and wcswidth() can be used to predict
> how many columns a string sent to a terminal emulator such as xterm will
> occupy on the screen.

Hmm.  Your implementation restricts wide characters mainly to the
East-Asia regions.  But there are many characters which you can hardly
display using normal glyphs, for example:

        ∰   U+2230   VOLUME INTEGRAL
        ⒛   U+249B   NUMBER TWENTY FULL STOP
        ⒨   U+24A8   PARENTHESIZED LATIN SMALL LETTER M
        ffl   U+FB04   LATIN SMALL LIGATURE FFL

I think wcwidth() could even return 3 for these characters and many
more. ;)
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Edmund GRIMLEY EVANS | 2 May 15:32 2000

Re: what shall we do about iconv?

About a month ago I wrote here that "there seems to be no sensible way
of implementing a function that converts data while reading it from a
stream and knows at the end how may non-reversible conversions
occurred". I tried contacting the Open Group about this, and I
received some replies from Andrew Josey.

The Group thinks that the specification is clear enough: iconv()
should return -1 whenever one of the conditions EILSEQ, E2BIG, EINVAL
or EBADF occurs. Application code is already reliant on this
behaviour, so it cannot be changed. Apparently the problem I pointed
out is real, but it would have to be solved using an alternate API
instead of iconv. I don't know whether anyone is likely to take any
concrete steps towards defining such an API, but I feel happier now
that I know what the situation is.

Just thought I'd register this for the archives ...

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Bruno Haible | 2 May 16:01 2000
Picon

Re: what shall we do about iconv?

Attachment: application/octet-stream, 1517 bytes
Bruno Haible | 2 May 16:57 2000
Picon

Updated HOWTO

Attachment: application/octet-stream, 762 bytes
etrapani | 3 May 22:55 2000
Picon

wprint 1.01

>From freshmeat
(http://www.freshmeat.net/appindex/2000/04/26/956785619.html):

WorldPrint is a filter for Netscape's postscript output that uses
TrueType fonts to allow the printing of pages written in Unicode, Big5,
SJIS, the ISO-8859* charsets (and maybe others). This does not require
Netscape to be able to render the full text on screen.

I thought you might be interested.

Bye, Eduardo.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

PILCH Hartmut | 4 May 08:05 2000
Picon

Re: Printing and non-latin1 pages

> Proposals?  It would be neat to have a comment in the postscript file
> stating what was the original encoding of the page.  That way I could
> automatically convert the file without having the user specify the
> encoding.  That is, if we get the original strings, if we get UTF8 or
> UTF16 in the poscript file there would be no need for that.

Sounds great.

But I must be missing something here, because I just installed the Wadalab
Japanese Postscript fonts in the /usr/lib/ghostscript/ path as usually
done in Japanese distributions, and everything works wihthout any
postprocessing filter.

We have the t1utils and ttf2pfb for perfectly converting TTF to PFB (type
1 postscript, compressed), and it should be not too difficult to create
some Unicode PFB files, and even CID files, which work with Ghostscript
>=5.5.  Ken Lunde has made CID versions of the Wadalab fonts available.

Already now, EUC-JP, SJIS and Latin-1 all print correctly under my
Netscape 4.6 without postprocessing, and with everything unified to UCS,
everything should be even more straightforward.

-phm

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Bruno Haible | 4 May 20:27 2000
Picon

how to test the new glibc-2.2's UTF-8 locales

Attachment: application/octet-stream, 10 KiB
Robert Brady | 4 May 20:30 2000
Picon

Bengali

Are there any readers of the Bengali script on this mailing list? Does
anyone know anyone who does?

--

-- 
Robert

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

M.K.Laha | 5 May 05:59 2000
Picon

Need info on Linux & Bengali

> Date:  Thu, 04 May 00 13:49PM CDT  
> From:  Robert Brady <rwb197 <at> ecs.soton.ac.uk>  
> To:  linux-utf8 <at> nl.linux.org  
> Subject:  Bengali  
>  
> Are there any readers of the Bengali script on this mailing list? Does
> anyone know anyone who does?
> 
> -- 
> Robert

Hi!

I read and write in the Bengali script. I would love to be able to
do that using Linux. My text processor is groff. I'd appreciate any
directions as to how I could augment groff to do Bengali. I am
prepared to help to in the implementation, too.

- Manas Laha
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Ulrich Drepper | 4 May 20:57 2000
Picon

Re: how to test the new glibc-2.2's UTF-8 locales

Bruno Haible <haible <at> ilog.fr> writes:

>   - The wide character properties (wcwidth, iswupper, etc.) are very
>     different from the tables of the Unicode consortium.

This is an issue of the locale data files since the information is
coming from these files.

>   - There is no UTF-7 support in iconv.

I currently don't think that promoting this ill-designed encoding is
useful.  Let it die.  Don't tell anybody that it ever existed.

> - Sources:
>   - glibc CVS sources, instructions are at http://sourceware.cygnus.com/glibc/
>     (remember to use "cvs -z 9" to save network bandwidth)

Please use -z3.  The actual differences in the amount of data are
minimal but you help the server.

--

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/


Gmane