Innocenti Maresin | 1 Aug 01:16 2003
Picon

kernel keymap and kbd_mode

Hello!

I use Linux console frequently, usually with KOI8-R charset (USSR cyrillic), 
but I also think that optional UTF-8 support is good :-) 
Since the original RedHat 9 russian keymap is written in KOI8-R (ignoring Unicode),
it still give KOI8-R from keyboard even in UTF-8 mode. It is not right.
But I found that Linux kernel's keymap is 16bit!
It's possible to load Unicode character by "loadkeys" via U+nnnn notation.
I supposed that inconsistence is in non-Unicode-awareness of that .kmap .
Then, I converted all keycodes to Unicode, and this modified keymap gave me UTF-8.
But not only in UTF-8 keyboard mode, in byte mode also :-/
Although my kernel was built with CONFIG_NLS_DEFAULT=koi8-r ,
no attempt to convert cyrillic to KOI8-R was made when I said "unicode_stop". 
Locale where I run "unicode_stop" also was koi8r. 
Evidently, the stupid "unicode_stop" script can't solve this problem
and kernel also failed to give me desired characters' translation.
Probably "loadkeys" made something incorrectly?
Of course, I can generate defkeymap.c with proper Unicode values,
rebuild my kernel and check there, but...
It seems to me that kernel code contains no ability to translate Unicode to 8bit,
so I don't go to waste my time building a new experimental kernel 
due to only change in defkeymap.c :-/

If I'm wrong and such kernel ability exists, my question is:
HOW to use it, how to use 8bit charsets in "kbd_mode -a" (ASCII or XLATE) mode?
Of course, some software may reload keymap as 8bit, 
but IMHO it would be not so good design.

If I'm right and kernel cannot (yet) translate keyboard input to an 8bit charset,
my questions are:
(Continue reading)

Chris Heath | 5 Aug 03:06 2003
Picon

Re: kernel keymap and kbd_mode

> I use Linux console frequently, usually with KOI8-R charset (USSR
> cyrillic), but I also think that optional UTF-8 support is good :-) 

I totally agree.  It is true that unicode_start/unicode_stop is lame,
but more importantly, there are some problems in the kernel itself that
have not been fixed at all in the new kernel 2.6.0-test2.

I am working on some patches, but I don't think they will be accepted
into the kernel.  The kernel maintainers are mostly focussing on
regressions -- things that were working in 2.4 but don't work as well in
2.6.

If you feel like building Linux 2.6.0-test2, I have just put my patches
up on my website.  http://chris.heathens.co.nz/linux/utf8.html

They fix these kernel problems:

 * Fixes for caps lock keys and compose/dead keys.

 * TTY fixes.  When using the cooked TTY line discipline, the delete key
   should delete a full UTF-8 character, not just one byte.

 * Selection fixes.  When you select text on the console with the mouse,
   it doesn't paste text in UTF-8.

Chris

Andries.Brouwer | 5 Aug 16:50 2003
Picon
Picon

Re: kernel keymap and kbd_mode

> http://chris.heathens.co.nz/linux/utf8.html

> Fixes for caps lock keys and compose/dead keys.

Hmm. Don't see these.

Chris Heath | 5 Aug 17:58 2003
Picon

RE: kernel keymap and kbd_mode

> > http://chris.heathens.co.nz/linux/utf8.html
> 
> > Fixes for caps lock keys and compose/dead keys.
> 
> Hmm. Don't see these.

I have kept it VERY simple by using the existing 8-bit structures.  All
I need to do is change a few put_queue()s into put_8bit()s and
everything that worked in XLATE mode continues to work in UNICODE mode.
(put_8bit() converts into UTF-8 when in Unicode mode.)

I'm not trying (yet) to add any new functionality to UNICODE mode, just
fixing regressions. 

Chris

Innocenti Maresin | 6 Aug 00:25 2003
Picon

Linux console internationalization

Chris Heath wrote:

> > I use Linux console frequently, usually with KOI8-R charset (USSR
> > cyrillic), but I also think that optional UTF-8 support is good :-)
>
> I totally agree.  It is true that unicode_start/unicode_stop is lame,
> but more importantly, there are some problems in the kernel itself that
> have not been fixed at all in the new kernel 2.6.0-test2.

Yes, I realized it and now I'am thinking about possible kernel modification.
First of all keyboard.c should be modified
to perform in XLATE mode Unicode->8bit translation
using inverse_translate defined in consolemap.c .
KOI8-R, Windows-1251 and even CP866 are in wide use in Russian-speaking Internet
(which use different 8bit mapping for the basically same characters, cyrillic),
so it's unsufficient to restrict to good UTF-8 support only.

I obviously need to coordinate my efforts with you
and official maintainers of the Linux console.

Further, I plan to fix "loadkeys" (to load Unicode instead of 8bit),
and maybe add to console multiple charsets support.
Also, I think that console-keyboard drivers may be divided to 2 versions,
with kernel build option:

the simple, with fixed 8bit codepage, unused legacy things like VT100 graphics removed
(for people who dont'n need i18n or work not so much in console)

and the sophisticated, full-functional and well internationalized.

(Continue reading)

Tomohiro KUBOTA | 6 Aug 00:57 2003
Picon

Re: Linux console internationalization

Hi,

From: Innocenti Maresin <av95 <at> comtv.ru>
Subject: Linux console internationalization
Date: Wed, 06 Aug 2003 02:25:32 +0400

> P.S. I just done a Web-page descibing my view of Linux console i18n
> and further plans.
> There is also a glossary of used terms.
> http://www.comtv.ru/~av95/linux/console/

Interesting, but any plan to support more than 512 characters?
512 is apparently much less than east Asian people's need.
(For example, Japanese basic character set (JIS X 0208) has
several thousands of characters.  12-year-old Japanese person
should know roughly one thousand characters and adults should
know much more.)
And, how about fullwidth characters (i.e., return value of
wcwidth() is 2) and combining characters (wcwidth() is 0),
like xterm supports them?

I am looking forward to linuxconsole project
http://linuxconsole.sourceforge.net/
Do you know the project?

---
Tomohiro KUBOTA <kubota <at> debian.org>
http://www.debian.or.jp/~kubota/

(Continue reading)

Innocenti Maresin | 6 Aug 01:22 2003
Picon

Re: Linux console internationalization

Tomohiro KUBOTA wrote:

> Interesting, but any plan to support more than 512 characters?

Not within VGA text modes.
2^9 is a hardware restriction based on text framebuffer's data semantic.

> 512 is apparently much less than east Asian people's need.

Of couse.
And I think that 9x16 (this is the largest glyph size usable in VGA text)
is apparently much less than is needed to read Japan glyphs without risk of eyes.
Even for 12-year-old Japanese person ;-)
So, VGA text seems not to be an acceptable solution for East Asia.

> I am looking forward to linuxconsole project
> http://linuxconsole.sourceforge.net/
> Do you know the project?

Thank you for this link.

--
qq~~~~\
/ /\   \
\  /_/ /
 \____/

Tomohiro KUBOTA | 6 Aug 03:39 2003
Picon

Re: Linux console internationalization

Hi,

From: Innocenti Maresin <av95 <at> comtv.ru>
Subject: Re: Linux console internationalization
Date: Wed, 06 Aug 2003 03:22:27 +0400

> Tomohiro KUBOTA wrote:
> 
> > Interesting, but any plan to support more than 512 characters?
> 
> Not within VGA text modes.
> 2^9 is a hardware restriction based on text framebuffer's data semantic.

I see.  It is since MS/PC-DOS version 6.x (so-called "DOS/V") that
IBM-compatible PC began to be able to display Japanese characters on
text screen.  (Before then Japanese local PC was used which has
hardware Japanese support.)

I imagine the MS/PC-DOS used VGA graphic mode.  (I heard that "V"
in the name "DOS/V" came from "VGA".)

> And I think that 9x16 (this is the largest glyph size usable in VGA text)
> is apparently much less than is needed to read Japan glyphs without risk of eyes.
> Even for 12-year-old Japanese person ;-)

Right.  On tty, Japanese character are displayed using two columns.
For example, when ASCII characters are 8x16, Japanese characters are
16x16.

> So, VGA text seems not to be an acceptable solution for East Asia.
(Continue reading)

Chris Heath | 6 Aug 14:29 2003
Picon

Re: Linux console internationalization

> Also, I think that console-keyboard drivers may be divided to 2 versions,
> with kernel build option:
> 
> the simple, with fixed 8bit codepage, unused legacy things like VT100 graphics removed
> (for people who dont'n need i18n or work not so much in console)
> 
> and the sophisticated, full-functional and well internationalized.

Yes, I think this is the way to go.  I imagine we will meet a lot of
resistance if we try to add heavyweight Unicode stuff into the existing
console.

The heavyweight version would need a LOT of requirements gathering
before we even begin programming.  It probably should include support
for:

* lots of encodings, but use Unicode internally
* user-space pluggability for extra-heavyweight stuff like Japanese
   input methods or fonts
* bidi text (Arabic)
* variable width fonts (CJK),
* variable-width encodings (Unicode combining chars), 
* absolute positioning for things like line drawing...
* ANSI / ECMA and VT100 compatibility

Hmmm... I suspect I've hardly even scratched the surface and already this
looks like it's going to be way too big for the kernel.  Maybe the
entire thing should be in user space.

How important is it to have an in-kernel console?
(Continue reading)

Edward H. Trager | 6 Aug 17:46 2003
Picon

Re: Linux console internationalization

On Wednesday 2003.08.06 08:29:37 -0400, Chris Heath wrote:
> > Also, I think that console-keyboard drivers may be divided to 2 versions,
> > with kernel build option:
> > 
> > the simple, with fixed 8bit codepage, unused legacy things like VT100 graphics removed
> > (for people who dont'n need i18n or work not so much in console)
> > 
> > and the sophisticated, full-functional and well internationalized.
> 
> Yes, I think this is the way to go.  I imagine we will meet a lot of
> resistance if we try to add heavyweight Unicode stuff into the existing
> console.
> 
> The heavyweight version would need a LOT of requirements gathering
> before we even begin programming.  It probably should include support
> for:
> 
> * lots of encodings, but use Unicode internally

Hmmm ... If I knew how to do this stuff myself (which I don't), I would
not bother with any encoding except Unicode UTF-8 (If people did not like
my patch, they wouldn't have to use it, right?).  That would cut down on
the requirements and complexity a bit ... (a BIG BIT, N'est-ce pas?).

I (and many others ...) would argue that everyone needs to move to Unicode.  
So when you buy that shiny new version of Linux 5 years in the future, 
it's going to support Unicode very well, and it is perhaps no longer going to 
support the 3-5 mutually incompatible legacy encodings of your language that you previously 
had to struggle with ...

(Continue reading)


Gmane