Andreas Schwab | 1 Sep 01:02 2006
Picon

Re: UCS-2BE

Kenichi Handa <handa <at> m17n.org> writes:

> In article <jey7t59dqh.fsf <at> sykes.suse.de>, Andreas Schwab <schwab <at> suse.de> writes:
>
>> Kenichi Handa <handa <at> m17n.org> writes:
>>>> See <http://www.unicode.org/versions/Unicode4.0.0/appC.pdf>.
>> >
>> > It says nothing about "UCS-2BE", either.
>
>> C.2 [...]  The 32-bit form is referred to as UCS-4 (Universal Character
>> Set coded in 4 octets), and the 16-bit form is referred to as UCS-2
>> (Universal Character Set coded in 2 octets).
>
> ??? So, what is "UCS-2BE"?

Like every multi-octet encoding you need to specify the byte order.

Andreas.

--

-- 
Andreas Schwab, SuSE Labs, schwab <at> suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Diane Murray | 1 Sep 01:25 2006
Picon

[Patch] url-http-create-request creates truncated paths

Argument and value pairs separated by semicolons in URLs, parsed as
attributes in URL/Emacs, are being left out of HTTP requests.  This
results in 404 errors or the wrong page being retrieved.  For example,
the request for the URL
<http://www.emacswiki.org/cgi-bin/wiki?action=browse;oldid=Gnus;id=GnusPage>
is "GET /cgi-bin/wiki/?action=browse HTTP/1.1" - note that everything
following the first semicolon is ignored in the request.

Additionally, I believe - in http:// URLs, at least - the target (like
#top) belongs at the very end of the URL, yet `url-recreate-url'
places it just before any attributes.  Please see the following patch
for fixes.

	* url-parse.el (url-recreate-url-attributes): New function, code
	simply moved from `url-recreate-url'.
	(url-recreate-url): Use it.  Put the `url-target' at the end of
	the URL after the attributes.

	* url-http.el (url-http-create-request): Use
	`url-recreate-url-attributes' when setting real-fname.

Index: url-parse.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/url/url-parse.el,v
retrieving revision 1.11
diff -c -r1.11 url-parse.el
*** url-parse.el	5 Feb 2006 23:15:07 -0000	1.11
--- url-parse.el	31 Aug 2006 22:54:10 -0000
***************
*** 100,116 ****
(Continue reading)

Jonathan Yavner | 1 Sep 01:36 2006
Picon

Re: UCS-2BE

> ??? So, what is "UCS-2BE"?

BE means "big-endian".  The more significant byte is stored first, 
followed by the less significant byte.  Also known as "network byte 
order".

UCS-2LE is what most people actually use on x86-based computers.  The 
less significant byte arrives before the more significant one in each 
16-bit quantity.
Juri Linkov | 1 Sep 01:35 2006

Re: makeinfo 4.7

> Why would you not upgrade the server?  It has reached end of service
> long ago.  There are not even security updates anymore.

I can't decide what to install on every server where I have an account.

> You had the time to check out, configure and install Emacs from CVS,
> but did not have the time to run
>
> yum update texinfo
>
> (or fetch and install a suitable RPM from one of many repositories).

It is very easy to fetch and compile the latest version of Emacs from CVS
in the home directory on any GNU/Linux machine without root privileges
with just two commands: `cvs ... co emacs' and `./configure && make bootstrap'.
I don't want to get the latest version of any other program to be able
to build Emacs ;)

--

-- 
Juri Linkov
http://www.jurta.org/emacs/
Juri Linkov | 1 Sep 01:32 2006

Re: UCS-2BE

> If UCS-2BE is a mislabel of UTF-16BE, UCS-2BE can simply be
> an alias of UTF16-BE.  If UCS-2BE is a BMP subset of
> UTF-16BE, UCS2-BE should be implemented differently from
> UTF-16BE

`UCS-2' is the fixed-length encoding of the BMP.  `UCS-2BE' is
a big-endian version of the UCS-2 encoding without using a BOM.
So as actually UCS-2 is a BMP subset of UTF-16, UCS-2BE is a BMP
subset of UTF-16BE (and UCS-2LE is a BMP subset of UTF-16LE).

The encodings `UCS-2' and `UCS-2BE' are implemented in iconv
(http://www.gnu.org/software/libiconv/), so you could look
at the implementation of UCS-2BE:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/ucs2be.h?revision=1.4&view=markup

Comparing it with the implementation of UTF-16BE, you can see that
UTF-16BE deals also with other planes:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/utf16be.h?revision=1.4&view=markup

And comparing UCS-2BE with the implementation of UCS-2, you can see that
UCS-2 also deals with a BOM:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/ucs2.h?revision=1.4&view=markup

There is one difference between outputting a BOM in the iconv
implementations of UCS-2 and UTF-16:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/utf16.h?revision=1.4&view=markup
(Continue reading)

Juri Linkov | 1 Sep 01:34 2006

Re: How to stop find-grep-dired?

> I'd strongly recommend C-c C-k as a binding to major mode authors to
> stop a process run on behalf of the major mode _if_ there is a good
> case for such a killing, and regardless of whether this process is run
> in a different buffer or not (for example, for the purpose of
> capturing output).

Do you know why comint doesn't bind C-c C-k to comint-kill-subjob?
Are there any reasons not to do so?

--

-- 
Juri Linkov
http://www.jurta.org/emacs/
David Abrahams | 1 Sep 02:47 2006
Picon
Picon

Re: Why are <next> and <prior> not called <page down> and <page up>?

"Drew Adams" <drew.adams <at> oracle.com> writes:

> but <prior> and <next> are standard names, which means that users
> can find things out about them (e.g.  Google).

Have you tried that?  Those words are so common and non-specific that
I doubt it would turn up much of use.

--

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com
Juliusz Chroboczek | 1 Sep 02:16 2006
Picon

xml-parse-region broken?

Try doing:

  (require 'xml)
  (with-temp-buffer
    (insert "<a>\n</a>\n")
    (xml-parse-region 1 (point-max)))

With 21.4.1, this gives ((a nil)).  With 22.0.50.1 (Debian version
1:20060824-1), it gives ((a nil "\n")).

                                        Juliusz Chroboczek
Kenichi Handa | 1 Sep 03:19 2006

Re: UCS-2BE

In article <87ac5ko50j.fsf <at> jurta.org>, Juri Linkov <juri <at> jurta.org> writes:

> `UCS-2' is the fixed-length encoding of the BMP.  `UCS-2BE' is
> a big-endian version of the UCS-2 encoding without using a BOM.
> So as actually UCS-2 is a BMP subset of UTF-16, UCS-2BE is a BMP
> subset of UTF-16BE (and UCS-2LE is a BMP subset of UTF-16LE).

Where did you get that info?

The word "encoding" is ambiguous here.  There are "CEF
(Character Encoding Form)" and "CES (Character Encoding
Scheme)".  Unicode says (see Glossary):

Character Encoding Form: Mapping from a character set
definition to the actual code units used to represent the
data.

Character Encoding Scheme: A character encoding form plus
byte serialization. ...

UCS-XXX are CEF, and UTF-XXX are CES.  So, UCS-XXX are not
appropriate lavel names for specifing how to byte-serialize
characters (i.e. on saving characters in a file).  At least,
that is the official definition in Unicode.

And, as you see now, there's is a contradition in the term
"UCS-2BE" because "BE" is information about
byte-serialization.  But the term "UCS-2BE" itself is not
defined in Unicode.  So, there are two possibilities:

(Continue reading)

Kenichi Handa | 1 Sep 03:22 2006

Re: UCS-2BE

In article <jeirk8ik4p.fsf <at> sykes.suse.de>, Andreas Schwab <schwab <at> suse.de> writes:

>>> C.2 [...]  The 32-bit form is referred to as UCS-4 (Universal Character
>>> Set coded in 4 octets), and the 16-bit form is referred to as UCS-2
>>> (Universal Character Set coded in 2 octets).
> >
> > ??? So, what is "UCS-2BE"?

> Like every multi-octet encoding you need to specify the byte order.

You are also confusing CEF and CES.  Please see my reply to
Juri.

---
Kenichi Handa
handa <at> m17n.org

Gmane