Karen Seo | 3 Jan 2005 06:13
Picon

2005 Network and Distributed System Security Symposium (NDSS '05)

  ** My apologies if you receive multiple copies of this message. **

               ANNOUNCING THE INTERNET SOCIETY'S
2005 NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS'05)

February 2, 2005 - Pre Conference Workshop
February 3-4, 2005 - Symposium
Catamaran Resort Hotel, San Diego, California
General Chair:  Eric Harder, National Security Agency
Program co-Chairs: Dan Simon, Microsoft Research
                   Dan Boneh, Stanford University

ONLINE INFORMATION AND REGISTRATION: http://www.isoc.org/ndss05/

    The 12th annual NDSS Symposium brings together innovative and
    forward thinking members of the Internet community including
    leading edge security researchers and implementers, globally
    recognized security technology experts, and users from both
    the private and public sectors who design, develop, exploit,
    and deploy the technologies that define network and distributed
    system security.

    NDSS'05 provides a balanced mix of technical papers (with a
    strong emphasis on implementation) that cover new and practical
    approaches to security problems that are endemic to network and
    distributed systems. 

THIS YEAR'S TOPICS INCLUDE:
     * Cryptography in Network Security
      * Denial of Service Attacks,
    * Peer to Peer Approaches
       * Internet Defense
      * Intrusion Detection
   * Platform Security.
FEATURED GUEST SPEAKER:
        * Amit Yoran, who was responsible for coordinating cyber-security
          activities for Homeland Security, will speak on "Security
          Challenges and Opportunities of the Future Enterprise"

PRE CONFERENCE WORKSHOP TOPICS INCLUDE:
* Security in handling mobility on the internet
        * Security in wireless LANs
     * Security for telephony or voice over IP
       * Trust relations in ad hoc networks
    * Key management strategies to support mobility
        * Security in RFID.
     More information is available at:
        http://www.isoc.org/isoc/conferences/ndss/05/workshop.shtml
     Parties interested in submitting papers should see the above
        web page for directions

REGISTER EARLY FOR BEST PRICING
     Registration for NDSS'05 is now open. Student rates are available
     for both the workshop and symposium. See the web site for more
     information -- http://www.isoc.org/ndss05/  Registration fees:
        *  November 31, 2004-January 10, 2005   $625.
        *  After January 10, 2005               $695.

SPONSORSHIP OPPORTUNITIES AVAILABLE!
     If your organization would like to help support NDSS and gain
     visibility through sponsoring, please contact:
     sponsor-ndss <at> isoc.org.  Information is also available at   
     http://www.isoc.org/ndss05/


Karen Seo
NDSS'05 Publicity Chair
Sam Hartman | 4 Jan 2005 02:39
Picon
Favicon

Re: UTF8

>>>>> "der" == der Mouse <mouse <at> Rodents.Montreal.QC.CA> writes:

    >>> All the same UTF-8 issues I've raised repeatedly,
    >> My requirement as an IESG member is that it be possible to have
    >> a properly internationalized ssh.  Among other things that
    >> means the characters in usernames and passwords need to belong
    >> to some character set.

    der> Why?  Perhaps this is the fundamental part I'm missing.  What
    der> does "properly internationalized" mean - or perhaps more
    der> precisely, what is there about being "properly
    der> internationalized" that demands that usernames, passwords,
    der> and filenames consist of character sequences rather than
    der> octet sequences?

I'd appreciate replies off-list as I believe we are outside of the
scope of this working group.

Humans use our software.  They typically enter characters using an
input method  for items such as usernames, passwords and
filenames.

One input method that is relatively common is a keyboard that maps
keycodes to characters.  This method tends to be mostly OK even if you
assume it produces octets not charactares.

Other input methods produce different octet sequences for semantically
similar content.  I gave an example of combining accents vs single
characters in the message introducing this thread.  So, the set of
octets produced depends not so much on what the user does, what keys
they press, or even what is displayed on their screen; it depends on
implementation details of the input method the user's OS happens to
use.  The user does not typically have enough control (and almost
certainly has insufficient knowledge) to end up with the other octet
sequences that are the same semantic content.

So, if we treat passwords as octet-strings then whether the user can
type their username or password will depend on what input method they
are using.  They may not even have control over this.

We, the IETF, e have decided for the most part that interoperability
requires that things work independent of what input method is used.

der Mouse | 4 Jan 2005 06:23
Picon

Re: UTF8

>> What does "properly internationalized" mean - or perhaps more
>> precisely, what is there about being "properly internationalized"
>> that demands that usernames, passwords, and filenames consist of
>> character sequences rather than octet sequences?
> I'd appreciate replies off-list as I believe we are outside of the
> scope of this working group.

In general, perhaps, but insofar as it bears on implementing ssh, I am
inclined to disagree - which is why I'm replying on-list anyway.

> [...]
> We, the IETF, e have decided for the most part that interoperability
> requires that things work independent of what input method is used.

As I see it, this amounts to "the IETF position is that humans think of
these things as character strings, so we demand that they be handled as
character strings by the protocol".

What is the IETF position, then, on how someone such as me should
handle the situation I'm faced with: writing software specified from
this point of view (ssh, in my case) for systems on which these
entities are _not_ character strings (a fairly traditional Unix
variant, NetBSD in my case)?  I'm faced with an encoding-agnostic
filesystem interface and implementation, wherein filename components
are sequences of octets not including 0x00 and 0x2f, independent of any
characters; I'm faced with password hashing routines that work with
octet strings, not character strings; etc.

Are such systems beyond the pale for the IETF, and I can do anything I
want, with a suggestion that I try to stay within something like the
spirit of the spec?  Is it simply not possible to implement ssh (or
anything else specified with similar normalization rules) on such a
system within the spec without converting all the affected code
(filename, username, and password handling in ssh's case) to the
character-string paradigm?  Am I required to reject attempted non-ASCII
strings in these places for no reason other than an inability to know
what the user intended the character set - if any - to be?  (For that
matter, what grounds are there for assuming that octets in the ASCII
range are intended to correspond to ASCII characters, rather than, say,
KOI-7?)

Or what?

Given how common such systems are, it seems a bit odd that the IETF
would take a position so apparently incompatible with them.  As an
implementer I find the situation rather confusing; there's obviously
something I don't understand going on, and I'd like to know what the
IETF's idea of the right thing for me to do here is.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse <at> rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Jeffrey Altman | 4 Jan 2005 08:41
Favicon

Re: UTF8

der Mouse wrote:

> As I see it, this amounts to "the IETF position is that humans think of
> these things as character strings, so we demand that they be handled as
> character strings by the protocol".

Absolutely not.  The IETF position is that if I am attempting to login
to machine H via SSH, I should be able to do so by knowing the necessary
bits: username, password, etc.

The requirement is that no matter what user interface I use to enter
these bits, I should be able to successfully authenticate.  Now if I 
happen to be in front of a keyboard based interface which is Unicode
aware and happens to generate "SMALL LETTER u WITH DIAERESIS" as two
code points represented as two 32-bit values or 8 octets instead of the 
non-Unicode aware system which uses a single code point represented as
a single byte, I have a problem.  I type exactly the same thing on
both keyboards and get extremely different octet strings.

Are you telling me that once I configure a login to work from one
particular platform and user interface configuration that I should
be locked into that choice exclusive of all the other system types
and user input methods which are available?

I would find it hard to believe that anyone could decide that this is
desireable.

> What is the IETF position, then, on how someone such as me should
> handle the situation I'm faced with: writing software specified from
> this point of view (ssh, in my case) for systems on which these
> entities are _not_ character strings (a fairly traditional Unix
> variant, NetBSD in my case)?  I'm faced with an encoding-agnostic
> filesystem interface and implementation, wherein filename components
> are sequences of octets not including 0x00 and 0x2f, independent of any
> characters; I'm faced with password hashing routines that work with
> octet strings, not character strings; etc.

As an AFS developer I am very sympathetic to the situation. 
Unfortunately. there are no true raw octet strings.  Octet sequences are
created within a context and without knowing the context it is not
possible to properly manipulate the octets.  At the present time AFS
does not support a notion of storing character-set context information.
This causes severe problems for users who want to access the names
associated with directories and files from heterogeneous systems.
File names created from most Unix user interfaces in Western Europe
will produce strings using Latin-1 code points.  Those from Eastern 
Europe will use Latin-2.  Linux systems may store unnormalized UTF-8.
Windows systems will store one of the many IBM/MS DOS OEM code pages.
A name created on one system not only will be displayed to users of
another system something which is incorrect but the name may be 
something which is completely unparsable.

At the moment the only safe set of strings that can be used are those
restricted to US-ASCII.  This is because US-ASCII is the only common
set of values which will be properly interpretted without additional
context information which is not available.

In the the long run we are going to need to fix AFS to do one of two
things:

(1) store context information associating the character set used to
     create each name AND provide the means necessary for file servers
     to be able to translate names from one character set to all the
     other possible sets.

(2) provide support for a normalized character set which is inclusive
     of all characters which users may be able to enter.

Having worked on the character set translation capabilities of C-Kermit
I can tell you that storing context information and providing 
translation is lossy and imperfect.  UNICODE solves the problem in a
much nicer and heterogeneous manner.  It is by no means perfect but
biting the bullet and supporting it makes the end user experience oh
so much nicer.

In the coming year I will be adding UNICODE support to AFS.  I expect
that all file systems will have to provide support for it in the years
to come.   Operating systems which do not provide support for character
set processing will find a smaller and smaller percentage of users.

> Are such systems beyond the pale for the IETF, and I can do anything I
> want, with a suggestion that I try to stay within something like the
> spirit of the spec?  Is it simply not possible to implement ssh (or
> anything else specified with similar normalization rules) on such a
> system within the spec without converting all the affected code
> (filename, username, and password handling in ssh's case) to the
> character-string paradigm?  Am I required to reject attempted non-ASCII
> strings in these places for no reason other than an inability to know
> what the user intended the character set - if any - to be?  (For that
> matter, what grounds are there for assuming that octets in the ASCII
> range are intended to correspond to ASCII characters, rather than, say,
> KOI-7?)

You have to draw the line somewhere if you are going to make progress
at improving cross platform user experience.  Systems without support
for character-set processing are useful only when all of the systems
they share information with are used in exactly the same context.
In a distributed heterogeneous environment such as the Internet,
this assumption cannot be made.

If a system wants to assume that all of its local input is in KOI-7
and an SSH implementation wants to be able to support that, then the
implementation must provide for character set translation from KOI-7
to UNICODE.  If you need such translation tables, they are available
from the UNICODE consortium and are implemented within a wide number
of open source packages.

> Or what?
> 
> Given how common such systems are, it seems a bit odd that the IETF
> would take a position so apparently incompatible with them.  As an
> implementer I find the situation rather confusing; there's obviously
> something I don't understand going on, and I'd like to know what the
> IETF's idea of the right thing for me to do here is.

You do what Kermit has done since 1981.  When moving information between
systems you convert from the local character set to a network neutral
form and then the receiver converts its local form.  Before the advent 
of UNICODE Kermit was forced rely on the user to choose an intermediary
character-set which would be inclusive of all characters used and be
understood by both systems.  When this was not possible, substitution
rules and best guesses forced the data stream to become lossy.

With the availability of UNICODE the available set of characters which
can be sent without loss has been greatly enhanced.  Normalization rules
are used to prevent multiple representations of a common input form
from preventing interoperability.  While this has a negative impact on
the ability to display strings to the end user after use; it enhances
the ability to provide for cross platform comparison and computation.

Jeffrey Altman

Attachment (smime.p7s): application/x-pkcs7-signature, 3256 bytes
Niels Möller | 4 Jan 2005 13:45
Picon
Picon
Picon
Favicon

Re: UTF8

der Mouse <mouse <at> Rodents.Montreal.QC.CA> writes:

> I'm faced with an encoding-agnostic
> filesystem interface and implementation, wherein filename components
> are sequences of octets not including 0x00 and 0x2f, independent of any
> characters;

Please leave the file system issues out of it for now. What's of
primary importantance are the core drafts, and those deal with
usernames and passwords in utf8 form, *not* file names. The issues for
filenames, e.g. in sftp, are slightly different, and not relevant to
the core drafts.

> I'm faced with password hashing routines that work with
> octet strings, not character strings; etc.

> Am I required to reject attempted non-ASCII
> strings in these places for no reason other than an inability to know
> what the user intended the character set - if any - to be?  (For that
> matter, what grounds are there for assuming that octets in the ASCII
> range are intended to correspond to ASCII characters, rather than, say,
> KOI-7?)

I'm assuming you're talking about the server implementation now
(client side is comparatively trivial; convert input to utf8 based on
the current $LC_CTYPE). On the server side, problem is that at login
time, you don't know the user's $LC_CTYPE. My recommendation is as
follows:

1. Chose one default encoding (be that plain ascii, or latin1, or
   koi-7, or normalized utf-8, depending on your context and
   preference).

2. Provide an option for the sysadmin to say that on his or her
   particular system, some other character set is used for user names
   and passwords.

Then convert the usernames and passwords you get on the wire to the
selected encoding. That's almost solves the problem, and it's no big
deal.

Optionally, to support systems where different users use different
character sets for their usernames and/or passwords, use some per user
configuration or kludgery to figure out the user's character set.

I'll be happy to discuss these implementation issues (my
implementation doesn't get non-ascii quite right yet either), but we
should probably do that off-list.

> Given how common such systems are, it seems a bit odd that the IETF
> would take a position so apparently incompatible with them.

Do you have some numbers to back that up? I've seen quite some number
of unix systems, but as far as I can recall, I've *never* seen one
where usernames and passwords used non-ascii characters. (I *have*
seen plenty of non-ascii filenames, but as I said, that's a different
issue, and irrelevant to the core drafts). I live in latin1-land, not
asia, though.

Best regards,
/Niels

der Mouse | 4 Jan 2005 17:14
Picon

Re: UTF8

>> As I see it, this amounts to "the IETF position is that humans think
>> of these things as character strings, so we demand that they be
>> handled as character strings by the protocol".
> Absolutely not.  The IETF position is that if I am attempting to
> login to machine H via SSH, I should be able to do so by knowing the
> necessary bits: username, password, etc.

But which is (say) the username?  The character string g e-acute r a r
d, or the octet string 0x67 0xe9 0x72 0x61 0x72 0x64?  A human is more
likely to think of it as the former; the reality to the computer is
more likely to be the latter.  (At least assuming an encoding-agnostic
user database such is at issue here.)  So does "entering the username"
mean typing g e-acute r a r d (for any of the various ways of typing
those characters), or does it mean typing whatever is necessary to
generate 0x67 0xe9 0x72 0x61 0x72 0x64?  (Note that either or both may
be impossible to do under reasonably plausible circumstances.)

The stated IETF position on interoperability makes no sense unless it's
based on the former of those two positions, which is why I phrased my
gloss on it the way I did.

> Are you telling me that once I configure a login to work from one
> particular platform and user interface configuration that I should be
> locked into that choice exclusive of all the other system types and
> user input methods which are available?

No; even if you go with the octet-string model, you are locked in only
to system types and input methods that permit you to generate that
octet string.

Very much the way, in fact, that the character-string model locks you
into the ability to generate the desired character string.

It's just a question of which lock you prefer to be in.

> In the the long run we are going to need to fix AFS to do one of two
> things:

> (1) store context information associating the character set [...]
> (2) provide support for a normalized character set [...]

Only if AFS is (or becomes) philosophically committed to considering
file names to be character strings.  (While this may not be a wrong
choice, it is still a choice, and you seem to be arguing from a
position that is unaware of that.)

Character strings make a lot of sense from some points of view, yes -
and that's true not only of filenames but of other things, such as
usernames.  Character strings are a better match to the way most people
think of them, if nothing else.  But they bring a whole passel of
problems with them, some of which we're discussing here.

The biggest problem is perhaps the one that got me writing to the list
about this: a large body of existing code that takes the octet-string
point of view and what the best way is to impedance-match it to a spec
that takes the character-string point of view.

> You have to draw the line somewhere if you are going to make progress
> at improving cross platform user experience.

I guess what I don't quite see is how rendering ssh unimplementable (or
implementable only crippledly, such as by restricting everything to
ASCII) on traditional Unix systems is going to improve anything.
Honestly, what I expect it to do is to create two imcompatible dialects
of ssh, one taking the character-string point of view and the other
taking the octet-string point of view, with humans rqeuired to deal
with the mismatch whenever they meet.  (There may be a third dialect
that imposes willy-nilly some guessed character set on the octet-string
environment....)

> Systems without support for character-set processing are useful only
> when all of the systems they share information with are used in
> exactly the same context.

I think that's too strong.  Rather, I would say, they allow mismatches
to show through in some form, usually in the form of text in one
character set being displayed in another and coming through as
nonsense.  This is not to say that they're _not_ useful in the face of
such things, just _less_ useful, or at least less transparently useful.

The corresponding upside, of course, is a simpler implementation and
more flexibility.

>> [...] I'd like to know what the IETF's idea of the right thing for
>> me to do here is.
> You do what Kermit has done since 1981.  When moving information
> between systems you convert from the local character set to a network
> neutral form

But this step cannot be done when I'm sending, because all I have is an
octet string.  I don't know what character set it's in; strictly
speaking, I don't even know whether it _is_ in a character set, though
for usernames and passwords it is extremely likely that it is, at least
in someone's mind (and for filenames it's reasonably likely).

> and then the receiver converts its local form.

And this is equally impossible, for similar reasons.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse <at> rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

der Mouse | 4 Jan 2005 17:56
Picon

Re: UTF8

> Please leave the file system issues out of it for now.

Okay.  The fundamental issues are the same anyway, as I see them.

>> Am I required to reject attempted non-ASCII strings in these places
>> for no reason other than an inability to know what the user intended
>> the character set - if any - to be?
> I'm assuming you're talking about the server implementation now
> (client side is comparatively trivial; convert input to utf8 based on
> the current $LC_CTYPE).

Hm?  I have no LC_CTYPE in my environment and have got by fine without
it for years.  The only places I've been able to think of that anything
knows that I'm using 8859-1 (which is what I use by default) are:

- The font I use to display text;
- The Content-Type: header my MUA is configured to default to;
- $LESSCHARDEF (and that just knows which characters are printable,
  basically; it couldn't tell you 8859-1 vs 8859-2);
- The mapping table between what I might loosely call compose sequences
  and octet values my text editor uses;
- The display substitutes my text editor uses when displaying non-ASCII
  text on an ASCII-only display;
- The non-ASCII keycodes I've chosen to xmodmap onto real keys on my
  keyboard in my X startup scripts;
- My own mind.

I don't know what I should set LC_CTYPE to to indicate 8859-1.  I don't
even know where to look to find that out - I don't use non-ASCII for
much beyond inter-human communication (eg email).

>> Given how common such systems are, it seems a bit odd that the IETF
>> would take a position so apparently incompatible with them.
> Do you have some numbers to back that up?  I've seen quite some
> number of unix systems, but as far as I can recall, I've *never* seen
> one where usernames and passwords used non-ascii characters.

Come to think of it, I've never seen one where they were actually used
(or where I knew they were used, at least - for example, I have known
very few passwords besides my own).  I don't know how many I've seen
where they would work if attempted.  I don't even recall hearing of
anyone attempting it, either, regardless of the outcome; I'll have to
ask the more internationalized of my friends.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse <at> rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Sam Hartman | 4 Jan 2005 18:22
Picon
Favicon

Re: UTF8

>>>>> "der" == der Mouse <mouse <at> Rodents.Montreal.QC.CA> writes:

    >>> What does "properly internationalized" mean - or perhaps more
    >>> precisely, what is there about being "properly
    >>> internationalized" that demands that usernames, passwords, and
    >>> filenames consist of character sequences rather than octet
    >>> sequences?
    >> I'd appreciate replies off-list as I believe we are outside of
    >> the scope of this working group.

    der> In general, perhaps, but insofar as it bears on implementing
    der> ssh, I am inclined to disagree - which is why I'm replying
    der> on-list anyway.

*sigh*  Teach me to try and explain something.

    der> What is the IETF position, then, on how someone such as me
    der> should handle the situation I'm faced with: writing software
    der> specified from this point of view (ssh, in my case) for
    der> systems on which these entities are _not_ character strings
    der> (a fairly traditional Unix variant, NetBSD in my case)?  I'm
    der> faced with an encoding-agnostic filesystem interface and
    der> implementation, wherein filename components are sequences of
    der> octets not including 0x00 and 0x2f, independent of any
    der> characters; I'm faced with password hashing routines that
    der> work with octet strings, not character strings; etc.

I think this is all mostly an open issue.  IN practice what unix
server implementers seem to do is to treat the username and password
as octet-strings.  Neils has proposed ways of doing significantly
better than that.

I do think there is significant room here for implementers to decide
what works best, write it up and publish it.  Based on timing it would
probably work better as an informational RFC than as input to the core
drafts.

--Sam

Joseph Galbraith | 4 Jan 2005 18:30
Favicon

Re: UTF8

Niels Möller wrote:
> der Mouse <mouse <at> Rodents.Montreal.QC.CA> writes:
 >
>>I'm faced with password hashing routines that work with
>>octet strings, not character strings; etc.
> 
>>Am I required to reject attempted non-ASCII
>>strings in these places for no reason other than an inability to know
>>what the user intended the character set - if any - to be?  (For that
>>matter, what grounds are there for assuming that octets in the ASCII
>>range are intended to correspond to ASCII characters, rather than, say,
>>KOI-7?)
> 
> I'm assuming you're talking about the server implementation now
> (client side is comparatively trivial; convert input to utf8 based on
> the current $LC_CTYPE). On the server side, problem is that at login
> time, you don't know the user's $LC_CTYPE. My recommendation is as
> follows:
> 
> 1. Chose one default encoding (be that plain ascii, or latin1, or
>    koi-7, or normalized utf-8, depending on your context and
>    preference).
> 
> 2. Provide an option for the sysadmin to say that on his or her
>    particular system, some other character set is used for user names
>    and passwords.
> 
> Then convert the usernames and passwords you get on the wire to the
> selected encoding. That's almost solves the problem, and it's no big
> deal.

Precisely.  By doing this you increase interoperatability from
only those systems that use the same character sets (i.e., only
koi-7 system interoperate with each other) to interoperating in
with all clients, as long as the same character set is used
on the server for passwords.

If this is too restrictive (i.e., different users on the same
server use different character sets for their passwords), do
this:

> Optionally, to support systems where different users use different
> character sets for their usernames and/or passwords, use some per user
> configuration or kludgery to figure out the user's character set.

For example, a non-script dot-file in the users home directory
that you can read to get such useful information as $LC_TYPE,
preferred umask, etc.  (Things you'd really like to know, but
don't want to run a user script to find out.)

>>Given how common such systems are, it seems a bit odd that the IETF
>>would take a position so apparently incompatible with them.
> 
> Do you have some numbers to back that up? I've seen quite some number
> of unix systems, but as far as I can recall, I've *never* seen one
> where usernames and passwords used non-ascii characters. (I *have*
> seen plenty of non-ascii filenames, but as I said, that's a different
> issue, and irrelevant to the core drafts). I live in latin1-land, not
> asia, though.

I will say that windows can and does use non-ascii usernames and
passwords, and it is not an uncommon operating system, though it
is not the most common of server platforms.

- Joseph

Niels Möller | 4 Jan 2005 19:41
Picon
Picon
Picon
Favicon

Re: UTF8

der Mouse <mouse <at> Rodents.Montreal.QC.CA> writes:

> Hm?  I have no LC_CTYPE in my environment and have got by fine without
> it for years.  The only places I've been able to think of that anything
> knows that I'm using 8859-1 (which is what I use by default) are:

It's basically needed whenever information is transferred to and from
remote systems, using protocols that are charset aware. E.g. if you
use a text web-browser to access pages written in utf-8, your browser
will do a better job displaying the content if it knows if your terminal
is configured to use latin1, utf8 or koi8r.

I think the primary reason I started setting LC_CTYPE many years ago
was that GNU ls uses the standard C locale system and isprint() to
figure out which characters in filenames are printable, and which
should be displayed specially. Don't know how BSD ls behaves in this
respect.

  $ touch räksmörgås
  $ ls
  räksmörgås
  $ LC_CTYPE='' ls
  r?ksm?rg?s

> - The Content-Type: header my MUA is configured to default to;
> - $LESSCHARDEF (and that just knows which characters are printable,
>   basically; it couldn't tell you 8859-1 vs 8859-2);

These are application specific configurations. I think it's preferable
to use a single configuration mechanism, namely LC_CTYPE.

> I don't know what I should set LC_CTYPE to to indicate 8859-1.

If you're saying C locales are broken, I'll agree that they're broken
in several different ways. That's why I wrote "comparatively trivial",
not "trivial"...

IMO, it *ought* to work to simply set LC_CTYPE to "iso-8859-1" if
that's what you're using, and LC_PAPER to "a4" if you want that. But
the current state of affairs seems to be that the simple way doesn't
work, one has to figure out a corresponding geographic region. E.g. I
use LC_CTYPE=sv_SE, and that seems to work for me. If I ever wanted,
say, "b5" as the default paper size, I have no idea which region, if
any, that choice would correspond to.

Regards,
/Niels


Gmane