Luís Oliveira | 2 Jan 23:05
Picon
Gravatar

Problem with gbk-map.lisp

Hello Wenpeng,

I missed an obvious problem with your patch, it depends on Lisps
loading gbk-map.lisp using the UTF-8 encoding. AFAIK, there's no good
way to portably enforce that via ASDF, so I've converted the file to
an ASCII representation. It's not as pretty as the previous version,
but it seems to work.

Cheers,

--

-- 
Luís Oliveira
http://r42.eu/~luis/
Anton Vodonosov | 28 Dec 22:21
Picon
Favicon

Unit tests failures on different lisps

Hello.

I am running tests of some most often Quicklisp-downloaded libraries, including babel.

Babel tests have different number of failures/errors on different Lisps (about 8, 9 or 5).

You may find the results here: http://common-lisp.net/project/cl-test-grid/pivot_ql-lib_lisp.html

Clicking the ok/fail status refer to the library test logs where you may find what failures
occurred.

Best regards,
- Anton
Xiaofeng Yang | 22 Dec 15:55
Picon

Re: Submit a GBK patch



I suggest this for both GB2312 and GBK encoding (this 2 encodings are very important for Chinese users):
http://trac.clozure.com/ccl/changeset/14911
I've tested this for CCL using the encoding tables and tests from GNU's iconv.

Most babel code came from CCL. I think to integrated this for babel is easily and more suitable though it just only for CCL now.

     Best regards,
Xiaofeng Yang


2011/12/21 levin <levin108 <at> gmail.com>
HI,all

Days ago, I found this project babel, and tried to use it in my project, as I'm Chinese,
I found it doesn't support GBK encoding in babel, so I wrote a patch for babel to make
it support GBK.

I just used common lisp for not a long time, so if there was something not
good enough, please let me know, I also hope you can accept this patch or help me
modify it to make babel support GBK, so we can use it freely to process Chinese text.

Thanks.

--
levin

_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/babel-devel



_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
levin | 21 Dec 12:21
Picon
Gravatar

Submit a GBK patch

HI,all

Days ago, I found this project babel, and tried to use it in my project, as I'm Chinese,
I found it doesn't support GBK encoding in babel, so I wrote a patch for babel to make
it support GBK.

I just used common lisp for not a long time, so if there was something not
good enough, please let me know, I also hope you can accept this patch or help me
modify it to make babel support GBK, so we can use it freely to process Chinese text.

Thanks.

--
levin


Attachment (0002-New-encoding-GBK.patch): application/octet-stream, 75 KiB
_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Dmitry Ignatiev | 27 Oct 15:46
Picon
Gravatar

Few fixes for ucs-2 and utf-32

Hi there.


There were some BOM-related bugs in unicode decoders.

patch attached
Attachment (enc-unicode.diff): application/octet-stream, 1036 bytes
_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Nicolas Martyanoff | 23 Apr 16:50
Picon
Gravatar

patch for cp1252


Hi,

I added support for the cp1252 encoding:

Sat Apr 23 16:41:22 CEST 2011  khaelin <at> gmail.com
  * add support for cp1252 encoding
diff -rN -u old-babel/babel.asd new-babel/babel.asd
--- old-babel/babel.asd	2011-04-23 16:49:25.088659055 +0200
+++ new-babel/babel.asd	2011-04-23 16:49:25.091992342 +0200
@@ -41,6 +41,7 @@
      (:file "enc-iso-8859")
      (:file "enc-unicode")
      (:file "enc-cp1251")
+     (:file "enc-cp1252")
      (:file "jpn-table")
      (:file "enc-jpn")
      (:file "external-format")
diff -rN -u old-babel/src/enc-cp1252.lisp new-babel/src/enc-cp1252.lisp
--- old-babel/src/enc-cp1252.lisp	1970-01-01 01:00:00.000000000 +0100
+++ new-babel/src/enc-cp1252.lisp	2011-04-23 16:49:25.098658916 +0200
@@ -0,0 +1,81 @@
+;;;; -*- Mode: lisp; indent-tabs-mode: nil -*-
+;;;
+;;; enc-cp1252.lisp --- Implementation of the CP1252 character encoding.
+;;;
+;;; Copyright (C) 2011, Nicolas Martyanoff
+;;;
+;;; Permission is hereby granted, free of charge, to any person
+;;; obtaining a copy of this software and associated documentation
+;;; files (the "Software"), to deal in the Software without
+;;; restriction, including without limitation the rights to use, copy,
+;;; modify, merge, publish, distribute, sublicense, and/or sell copies
+;;; of the Software, and to permit persons to whom the Software is
+;;; furnished to do so, subject to the following conditions:
+;;;
+;;; The above copyright notice and this permission notice shall be
+;;; included in all copies or substantial portions of the Software.
+;;;
+;;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+;;; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+;;; MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+;;; NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+;;; HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+;;; WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+;;; OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+;;; DEALINGS IN THE SOFTWARE.
+
+(in-package #:babel-encodings)
+
+(define-character-encoding :cp1252
+    "A 8-bit, fixed-width character encoding used by Windows for Western
+    European languages."
+  :aliases '(:windows-1252)
+  :literal-char-code-limit 256)
+
+(define-constant +cp1252-to-unicode+
+    #(#x20ac    nil #x201a #x0192 #x201e #x2026 #x2020 #x2021
+      #x02c6 #x2030 #x0160 #x2039 #x0152    nil #x017d    nil
+         nil #x2018 #x2019 #x201c #x201d #x2022 #x2013 #x2014
+      #x02dc #x2122 #x0161 #x203a #x0153    nil #x017e #x0178)
+  :test #'equalp)
+
+(define-unibyte-decoder :cp1252 (octet)
+  (if (and (>= octet #x80) (<= octet #x9f))
+      (svref +cp1252-to-unicode+
+             (the ub8 (- octet #x80)))
+      octet))
+
+(define-constant +unicode-0152-017e-cp1252+
+    #(#x8c #x9c #x00 #x00 #x00 #x00 #x00 #x00
+      #x00 #x00 #x00 #x00 #x00 #x00 #x8a #x9a
+      #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
+      #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
+      #x00 #x00 #x00 #x00 #x00 #x00 #x9f #x00
+      #x00 #x00 #x00 #x8e #x9e)
+  :test #'equalp)
+
+(define-constant +unicode-2013-203a-cp1252+
+    #(#x96 #x97 #x00 #x00 #x00 #x91 #x92 #x82
+      #x00 #x93 #x94 #x84 #x00 #x86 #x87 #x95
+      #x00 #x00 #x00 #x85 #x00 #x00 #x00 #x00
+      #x00 #x00 #x00 #x00 #x00 #x89 #x00 #x00
+      #x00 #x00 #x00 #x00 #x00 #x00 #x8b #x9b)
+  :test #'equalp)
+
+(define-unibyte-encoder :cp1252 (code)
+  (cond
+    ((or (< code #x80)
+         (and (> code #xa0) (<= code #xff)))
+     code)
+    ((and (>= code #x0152) (<= code #x017e))
+     (svref +unicode-0152-017e-cp1252+
+            (the ub8 (- code #x0152))))
+    ((= code #x0192) #x83)
+    ((= code #x02c6) #x88)
+    ((= code #x02dc) #x89)
+    ((and (>= code #x2013) (<= code #x203a))
+     (svref +unicode-2013-203a-cp1252+
+            (the ub8 (- code #x2013))))
+    ((= code #x20ac) #x80)
+    ((= code #x2122) #x99)
+    (t (handle-error))))

I hope you will find it useful.

Regards,

--

-- 
Nicolas Martyanoff
   http://codemore.org
   khaelin <at> gmail.com
_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Rob Blackwell | 6 Apr 12:07
Gravatar

octets-to-string with UTF8 and Byte Order Marker

Hi,

 

I have some byte arrays which are UTF8 and some which are UTF8 with byte order markers.

 

I can convert these arrays to strings using

 

> (babel:octets-to-string foo)

 

and

 

> (babel:octets-to-string foo :start 3)

 

respectively, but I'm currently having to figure out whether there is a BOM, like this

 

> (subseq foo 0 3)

#(239 187 191)

 

If I use (babel:octets-to-string foo) on a byte array with BOM markers, then my SBCL Lisp image dies.

 

Is there a better way to ask Babel to discover the correct encoding by looking for Byte Order Marks? Ideally I’d like one function call that worked with any array and figured out which encoding was being used automatically and works whether or not a BOM is present?

 

Sorry if I'm missing something obvious, I'm a Babel newbie .. Any guidance or code samples gratefully received.

 

Thanks,

 

Rob.

 

_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Dmitry Ignatiev | 28 Oct 19:50
Picon
Gravatar

Unicode encodings with explicit endianness

Hello again, Luís

Please, can you hurry up a bit with applying my unicode-related patch?
I'm currently in process of developing a library, a binding to ms windows api:
http://github.com/Lovesan/doors
And the reason of me being worried about that patch is because my library is unable to correctly operate on windows unicode strings without UTF-16LE support in babel.

_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Dmitry Ignatiev | 3 Oct 10:44
Picon
Gravatar

UTF-16/32 encodings with explicit endianness

Hi there.
I've added support for UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.
Here's the new version of enc-unicode.lisp

Attachment (enc-unicode.tar.gz): application/x-gzip, 7694 bytes
_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Vsevolod Dyomkin | 4 Aug 16:07
Picon
Gravatar

question about #\Nul char and Unicode

Hi,

I'm stuck with a problem:  I'm using CL-ZMQ, that in turn uses CFFI, that in turn uses BABEL for such tasks as FOREIGN-STRING-TO-LISP conversion.
There seams to be a problem with 0 (#\Nul) characters for such strings, which can be seen below:

Illegal :UTF-8 character starting at position 328.
   [Condition of type BABEL-ENCODINGS:INVALID-UTF8-CONTINUATION-BYTE]

Restarts:
...

Backtrace:
  0: ((LAMBDA (BABEL-ENCODINGS::SRC BABEL-ENCODINGS::START BABEL-ENCODINGS::END BABEL-ENCODINGS::DEST BABEL-ENCODINGS::D-START)) ..)
  1: (CFFI:FOREIGN-STRING-TO-LISP #.(SB-SYS:INT-SAP #X0808E13C))[:EXTERNAL]
...

The translated string in the current example is this:
#(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #\- #\9 #\6 #\2 #\8 #\- #\4 #\0 #\a #\4 #\- #\9 #\a #\2 #\d #\- #\c #\c #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\  #\1 #\8 #\  #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\  #\2 #\6 #\0 #\Space #\{ #\" #\P #\A #\T #\H #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\M #\E #\T #\H #\O #\D #\" #\Space #\" #\G #\E #\T #\" #\, #\" #\V #\E #\R #\S #\I #\O #\N #\" #\Space #\" #\H #\T #\T #\P #\/ #\1 #\. #\1 #\" #\, #\" #\U #\R #\I #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\P #\A #\T #\T #\E #\R #\N #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\A #\c #\c #\e #\p #\t #\" #\Space #\" #\* #\/ #\* #\" #\, #\" #\H #\o #\s #\t #\" #\Space #\" #\l #\o #\c #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #\" #\, #\" #\U #\s #\e #\r #\- #\A #\g #\e #\n #\t #\" #\Space #\" #\c #\u #\r #\l #\/ #\7 #\. #\2 #\0 #\. #\0 #\  #\( #\i #\4 #\8 #\6 #\- #\p #\c #\- #\l #\i #\n #\u #\x #\- #\g #\n #\u #\) #\  #\l #\i #\b #\c #\u #\r #\l #\/ #\7 #\. #\2 #\0 #\. #\0 #\  #\O #\p #\e #\n #\S #\S #\L #\/ #\0 #\. #\9 #\. #\8 #\n #\  #\z #\l #\i #\b #\/ #\1 #\. #\2 #\. #\3 #\. #\4 #\  #\l #\i #\b #\i #\d #\n #\/ #\1 #\. #\1 #\5 #\  #\l #\i #\b #\s #\s #\h #\2 #\/ #\1 #\. #\2 #\. #\4 #\" #\} #\, #\0 #\Space #\, #\n #\S #\S #\L #\/ #\0 #\. #\Nul #\Nul)

Maybe, someone here can explain, why this 0-characters are not recognized as proper utf-8 ones?

Thanks!
Vsevolod

_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/babel-devel
Andrey Moskvitin | 26 Dec 12:29
Picon
Gravatar

Version 0.3.1 is requried

Since the last release had a few changes, in particular, important for Russia support cp1251. But these changes are now available only in darcs-version. What prevents release version 0.3.1? This would simplify the distribution of some packages.


Andrey
_______________________________________________
babel-devel mailing list
babel-devel <at> common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/babel-devel

Gmane