Mike O'Dell | 1 Feb 2005 05:26
Favicon

Re: Question on compath() in sbr/path.c

there are two ways to do filenames

one is purely as text strings which can admit
a number of algebras, one of which provides for
having dot-dot reflect the temporal state which
the current string to be what it is, and all
name operations are done with string algebra.

the other way is using a directed graph as 
the implementation.  the Unix filesystem
works this way. dot and dot-dot are implemented
as explicit pointers to directory inodes.
if the directed graph (directory graph) is used
to evaluate dot-dot, the parent is the one
reflected in the graph.

there is no "right" answer - one can argue that
in some cases one is more consistent than the other.

if dot-dot is to be recorded in the filesystem, then
its value cannot be context-sensitive based on the
computation history of the process.

if dot-dot is evaluated with string algebra, like
is done with many shell "builtin" cd functions,
then the value of dot-dot can indeed be context-sensitive
and reflect whatever is desired. the usual context
is the temporal sequence of "cd" operations done by
the shell to get to a given directory.

(Continue reading)

Paul Fox | 14 Feb 2005 04:45
Picon
Favicon

scan or show of UTF-encoded headers?


can nmh decode UTF or otherwise-encoded headers?  it's not that
i _want_ to be able to read all of the UTF-encoded spam i get, but
i recently, for the very first time, got a legitimate piece of
mail with encoded Subject:, From:, and To: lines.  i'd like to
be better prepared for next time...

paul
=---------------------
 paul fox, pgf <at> foxharp.boston.ma.us (arlington, ma, where it's 19.8 degrees)

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

Oliver Kiddle | 14 Feb 2005 11:00
Picon
Favicon

Re: scan or show of UTF-encoded headers?

Paul Fox wrote:
>  
> can nmh decode UTF or otherwise-encoded headers?  it's not that

Yes. See the decode function in the mh-format manual page. It has a few
limitations however. It doesn't use iconv or similar to convert headers
to the current encoding. So you need to use a UTF-8 locale and set the
MM_CHARSET environment variable to UTF-8. That means that it then won't
decode a ISO-8859-1 header anymore.

> i _want_ to be able to read all of the UTF-encoded spam i get, but

The thing I find with spam is that they always seem to break the rfc by
including space characters in the encoded section of the header. I don't
know whether this is also common in legitimate mails but nmh doesn't
decode such headers. The relevant code is in sbr/fmt_rfc2047.c if you're
interested in looking.

Oliver

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Fox | 14 Feb 2005 15:39
Picon
Favicon

Re: scan or show of UTF-encoded headers?

 > Paul Fox wrote:
 > >  
 > > can nmh decode UTF or otherwise-encoded headers?  it's not that
 > 
 > Yes. See the decode function in the mh-format manual page. It has a few
 > limitations however. It doesn't use iconv or similar to convert headers
 > to the current encoding. So you need to use a UTF-8 locale and set the
 > MM_CHARSET environment variable to UTF-8. That means that it then won't
 > decode a ISO-8859-1 header anymore.

hmmm.  i'll play with it.  does anyone have any clever scripts to
wrap this up into a nice solution?

 > 
 > > i _want_ to be able to read all of the UTF-encoded spam i get, but

somehow when you remove the line that preceded that one, it makes me
sound like a nutcase, eh?  :-)

paul
=---------------------
 paul fox, pgf <at> foxharp.boston.ma.us (arlington, ma, where it's 27.1 degrees)

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

Harald Geyer | 14 Feb 2005 17:00
Picon
Picon

Re: scan or show of UTF-encoded headers?

>  > Paul Fox wrote:
>  > >  
>  > > can nmh decode UTF or otherwise-encoded headers?  it's not that
>  > 
>  > Yes. See the decode function in the mh-format manual page. It has a few
>  > limitations however. It doesn't use iconv or similar to convert headers
>  > to the current encoding. So you need to use a UTF-8 locale and set the
>  > MM_CHARSET environment variable to UTF-8. That means that it then won't
>  > decode a ISO-8859-1 header anymore.
> 
> hmmm.  i'll play with it.  does anyone have any clever scripts to
> wrap this up into a nice solution?

What do you consider a nice solution? I use the method as described
by Oliver (actually that's the default of the debian package). It works
satisfactory but unfortunately we have a wild mixture of latin1 and latin9
in europe (thanks to MS windows not being able or willing to adapt to
the new situation in the past four years) so half of the mails I 
get isn't decoded at all. If anybody has a patch or an other solution,
I would be interested as well.

Harald

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Fox | 14 Feb 2005 17:25
Picon
Favicon

Re: scan or show of UTF-encoded headers?

 > >  > Yes. See the decode function in the mh-format manual page. It has a few
 > >  > limitations however. It doesn't use iconv or similar to convert headers
 > >  > to the current encoding. So you need to use a UTF-8 locale and set the
 > >  > MM_CHARSET environment variable to UTF-8. That means that it then won't
 > >  > decode a ISO-8859-1 header anymore.
 > > 
 > > hmmm.  i'll play with it.  does anyone have any clever scripts to
 > > wrap this up into a nice solution?
 > 
 > What do you consider a nice solution? I use the method as described
 > by Oliver (actually that's the default of the debian package). It works
 > satisfactory but unfortunately we have a wild mixture of latin1 and latin9
 > in europe (thanks to MS windows not being able or willing to adapt to
 > the new situation in the past four years) so half of the mails I 
 > get isn't decoded at all. If anybody has a patch or an other solution,

i guess i was thinking of a wrapper for scan or show that took care
of setting up the locale and charset, either via argument for manually
choosing, or maybe even by examining the message and then figuring out
what locale/charset it should probably use, this time.

(i confess i don't exchange a lot of mail with non-english/
ascii-speaking correspondents, and being american/english/ascii
guy myself, have never really had to adjust locales or charsets etc.
which is to say, i may not fully understand what i'm asking for.  :-)

paul
=---------------------
 paul fox, pgf <at> foxharp.boston.ma.us (arlington, ma, where it's 32.5 degrees)

(Continue reading)

Martin McCormick | 10 Feb 2005 21:29

refile Sometimes totally Shreds a Message

	I use nmh-1.0.4 in FreeBSD UNIX and have noticed that the
refile function occasionally eats a message.  It moves it from one
folder to another all right, but what ends up in the receiving folder
is a file containing all 0xFF's.

	I have tried to capture a message that triggers this behavior
but it is difficult since most messages do not self-destructand refile corectly.
When one does shred, I can't get it back to experiment with because,
by definition of the problem, it is simply gone.

Martin McCormick WB5AGZ  Stillwater, OK 
OSU Information Technology Division Network Operations Group

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

Oliver Kiddle | 14 Feb 2005 19:34
Picon
Favicon

Re: scan or show of UTF-encoded headers?

You wrote:

> i guess i was thinking of a wrapper for scan or show that took care
> of setting up the locale and charset, either via argument for manually
> choosing, or maybe even by examining the message and then figuring out
> what locale/charset it should probably use, this time.

It's probably easier to hack the C code. I've had a quick go at
producing something which uses iconv to convert stuff to the native
character set (patch is below). Would be good if you could try this out
and look for ways to improve it.

I've not thought through what the between_encodings stuff is doing and
if that is affected at all. If this is going to be turned into something
we can commit to CVS, we also need to work out the necessary configure
stuff for iconv. As it is, you may need to fiddle the Makefile to get
this to compile.

Oliver

Index: h/prototypes.h
===================================================================
RCS file: /cvsroot/nmh/nmh/h/prototypes.h,v
retrieving revision 1.9
diff -u -r1.9 prototypes.h
--- h/prototypes.h	27 Jan 2005 16:26:24 -0000	1.9
+++ h/prototypes.h	14 Feb 2005 18:18:38 -0000
 <at>  <at>  -61,6 +61,7  <at>  <at> 
 char **getans (char *, struct swit *);
 int getanswer (char *);
(Continue reading)

Harald Geyer | 14 Feb 2005 19:35
Picon
Picon

Re: scan or show of UTF-encoded headers?

>  > What do you consider a nice solution? I use the method as described
>  > by Oliver (actually that's the default of the debian package). It works
>  > satisfactory but unfortunately we have a wild mixture of latin1 and latin9
>  > in europe (thanks to MS windows not being able or willing to adapt to
>  > the new situation in the past four years) so half of the mails I 
>  > get isn't decoded at all. If anybody has a patch or an other solution,
> 
> i guess i was thinking of a wrapper for scan or show that took care
> of setting up the locale and charset, either via argument for manually
> choosing, or maybe even by examining the message and then figuring out
> what locale/charset it should probably use, this time.

If one wants to do it manually 'export MM_CHARSET="ISO-8859-1"' works
for me, but usually you don't do that, because having a correctly decoded
subject isn't worth to type that in. Also of couse the terminal must
be able to handle the charset. With latin1 and latin9 there ist no
problem, but if you want UTF-8 you need to change your terminal too,
with what ever tool your os provides for that. With scan that wouldn't
work at all, because you can have any number of different charsets in
the headers of the many messages in one folder.

Obviously any script which tries to do the above runs into the same
problem that prevents nmh from doing it itself: The script would need
to know which charsets the terminal can handle and how to tell it.
Also changing the terminal might confuse other programs.

I guess it would be much easier und less prone to error to just
implement transcoding of messages through iconv instead of trying
to adapt the display on a per message basis.

(Continue reading)

Valdis.Kletnieks | 14 Feb 2005 21:34
Picon
Favicon

Re: scan or show of UTF-encoded headers?

On Mon, 14 Feb 2005 19:35:36 +0100, Harald Geyer said:

> Obviously any script which tries to do the above runs into the same
> problem that prevents nmh from doing it itself: The script would need
> to know which charsets the terminal can handle and how to tell it.
> Also changing the terminal might confuse other programs.
> 
> I guess it would be much easier und less prone to error to just
> implement transcoding of messages through iconv instead of trying
> to adapt the display on a per message basis.

In general, you *can't* do a good job of using iconv to mash things between
the various iso8859-* charsets.  There *will* be lossage - after all, there
is a *reason* they're up to -15, namely that one isn't sufficient.  So whichever
one you're in, there *will* be lossage for the other 14.

On the flip side, it's possible to do lossless conversion *from* any 8859-*
into the UTF-8 space.  So teaching the code that currently does MM_CHARSET
that if the user is in a UTF-8 environ, it should use iconv to convert 8859
to utf-8 is a better solution.

And yes, it's possible that the user is in a utf-8 environment, but doesn't
have actual font glyghs for all the planes (so, for instance Hebrew or
Cyrillic characters don't display).  This is actually a non-issue, for 2 reasons:

1) If they don't have the Hebrew glyghs installed, there's nothing you could
have done anyhow.

2) On the other hand, it's fairly safe to assume that if they're in a UTF-8
locale, that their software has at least enough smarts to put up a "unknown
(Continue reading)


Gmane