dai | 2 Oct 2003 00:52
Picon

Re: splitting string with null string

Hi, this is dai inukai.

Seems reasonable for me the procedure returns #f or the given
arguement itself and more reasonable if the procedure returns the
given arguement as is from my viewpoint that the null string matches
nothing.

The latter seems more reasonable because the returned #f is, in my
opinion, apt to lead to an error judgement and to prevent the program
from going ahead while it encourages the program to think and behave
the best it can do.

Alex Shinn worte:

> SCM returns #("abc") like Chicken (but as a vector), and also reverses
> the order of the arguments.  For (string-split "/" "/a/b/c/") it gives
> #("" "a" "b" "c") including the leading but not trailing null string.
> 
> G

This is not the SCM behaviour but that of the POSIX regex function it
calls.  The returned value changes depending on the library it calls.

dai -- http://www4.ocn.ne.jp/~inukai/

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
(Continue reading)

Kimura Fuyuki | 4 Oct 2003 13:29

missing constant

F_SETFD is missing from the export list of the fcntl module.

Index: fcntl.scm
===================================================================
RCS file: /cvsroot/gauche/Gauche/ext/fcntl/fcntl.scm,v
retrieving revision 1.3
diff -u -r1.3 fcntl.scm
--- fcntl.scm	5 Jul 2003 03:29:10 -0000	1.3
+++ fcntl.scm	4 Oct 2003 11:24:15 -0000
 <at>  <at>  -38,7 +38,7  <at>  <at> 
   (export <sys-flock>
           sys-fcntl

-          |F_DUPFD|  |F_GETFD|  |F_GETLK|  |F_GETFL|  |F_SETFL|
+          |F_DUPFD|  |F_GETFD|  |F_SETFD|  |F_GETFL|  |F_SETFL|
           |F_GETLK|  |F_SETLK|  |F_SETLKW|
           |F_RDLCK|  |F_WRLCK|  |F_UNLCK|  |FD_CLOEXEC|
           |O_RDONLY| |O_WRONLY| |O_RDWR|   |O_APPEND| |O_CREAT|

-- fuyuki

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Shiro Kawai | 3 Oct 2003 12:08
Favicon

Re: splitting string with null string

Thanks for all valuable replies.

Returning ("a" "b" "c") is certainly ad-hoc in terms of
it is inconsistent with other cases, and this particular
functionality can be achieved easily by (map string (string->list s)),
(or, if you use gauche.sequence, it would be (map string s)),
so I drop that option.

Returning "abc", i.e. regarding null string to match
nothing, seems plausible, though it contradicts other
part of the library, i.e. 

 ;; srfi-13: null string is a part of any string.
 (string-prefix? "" "abc") => #t 
 (string-contains "abc" "") => 0

 ;; regexp: empty regexp matches any string.
 (rxmatch #// "abc") => #<<regmatch> ...>

So, at least for the time being, I raise an error when
the splitter matches empty string.  It would be easier
to loose the restriction later, than vice versa.

--shiro

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
(Continue reading)

Kimura Fuyuki | 5 Oct 2003 11:19

unsetenv

I noticed Gauche lacks the means to unset the environment variable, so
I added it along with the setenv() interface and configure tweaks to
work with autoheader 2.57. (Was that the right thing to do?)

-- fuyuki

Index: configure.ac
===================================================================
RCS file: /cvsroot/gauche/Gauche/configure.ac,v
retrieving revision 1.12
diff -u -r1.12 configure.ac
--- configure.ac	3 Oct 2003 10:59:46 -0000	1.12
+++ configure.ac	5 Oct 2003 09:14:52 -0000
 <at>  <at>  -21,14 +21,14  <at>  <at> 
 or 'no' (No multibyte support)]),
   [
    case $enable_multibyte in
-     euc-jp|eucjp|yes)  AC_DEFINE(GAUCHE_CHAR_ENCODING_EUC_JP);;
-     utf-8|utf8)        AC_DEFINE(GAUCHE_CHAR_ENCODING_UTF_8);;
-     sjis|shift-jis)    AC_DEFINE(GAUCHE_CHAR_ENCODING_SJIS);;
+     euc-jp|eucjp|yes)  AC_DEFINE(GAUCHE_CHAR_ENCODING_EUC_JP,1,[Define if Gauche handles multi-byte
character as EUC-JP]);;
+     utf-8|utf8)        AC_DEFINE(GAUCHE_CHAR_ENCODING_UTF_8,1,[Define if Gauche handles multi-byte
character as UTF-8]);;
+     sjis|shift-jis)    AC_DEFINE(GAUCHE_CHAR_ENCODING_SJIS,1,[Define if Gauche handles multi-byte
character as Shift JIS]);;
      no|none) ;;
      *) echo "unrecognized encoding option: '$enable_multibyte'; type ./configure --help for available options"
         exit 1;;
    esac
(Continue reading)

Shiro Kawai | 5 Oct 2003 11:28
Favicon

Re: unsetenv

0.7.2 is now a final release stage, so sys-unsetenv won't be in,
but I'll add it after I release 0.7.2.

Autoheader screws up config.h, so I don't use it anymore.
(What I don't like about it is that it puts all AC_DEFINEd
symbols into one place, while I want to keep separated
config.h for extension modules.  It also makes the manual
postprocess difficult by reordering those symbols.)

--shiro

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Shiro Kawai | 5 Oct 2003 13:31
Favicon

Gauche-0.7.2, Gauche-gl-0.3 and Gauche-gtk-0.3.1

New releases:

Gauche-0.7.2 - A major maintainance release, including
    many bug fixes since 0.7.1, performance improvements
    of some procedures in performance, and a couple of
    new modules.  
    Please see http://www.shiro.dreamhost.com/scheme/gauche/index.html
    for the details.

Gauche-gl-0.3 - Now it runs on Cygwin with opengl-1.1.0-6.
    I haven't tried with XFree86/Cygwin, though.

Gauche-gtk-0.3.1 - It now works with Gtk-2.2.

enjoy.

--shiro

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Kimura Fuyuki | 6 Oct 2003 05:25

string-split vs. string-tokenize

The Gauche's reference manual states as follows:

	Note: When splitter is a char-set, string-split is equivalent
	to string-tokenize except the meaning of the char-set is
	reversed.

It is not true in some boundary cases.

(string-split "" (char-set-complement char-set:graphic))
=> ("")
(string-tokenize "")
=> ()

(string-split " a " (char-set-complement char-set:graphic))
=> ("" "a" "")
(string-tokenize "")
=> ("a")

This is the same behavior difference between python's split() and
perl/ruby's.

Needs another vote? I prefer the string-tokenize's behavior, just
because it fits in the code I'm writing now. ;)

-- fuyuki

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
(Continue reading)

Shiro Kawai | 6 Oct 2003 12:05
Favicon

Re: string-split vs. string-tokenize

From: Kimura Fuyuki <fuyuki <at> nigredo.org>
Subject: [Gauche-devel] string-split vs. string-tokenize
Date: Mon, 06 Oct 2003 12:25:53 +0900

> It is not true in some boundary cases.
> 
> (string-split "" (char-set-complement char-set:graphic))
> => ("")
> (string-tokenize "")
> => ()
> 
> (string-split " a " (char-set-complement char-set:graphic))
> => ("" "a" "")
> (string-tokenize "")
> => ("a")

I see.  I withdraw the description in the manual.

Conceptually, string-split returns what's remaining in
the string, after removing delimiter sequences.  
So I feel (string-split "" char-set:full) should return "",
since nothing is taken from the given string, "".

On the other hand, string-tokenize extracts tokens that
match given charset, so (string-tokenize "") => () is
natural, since no token is matched in the string "".

--shiro

-------------------------------------------------------
(Continue reading)

Kimura Fuyuki | 7 Oct 2003 05:30

regexp oddity?

I can't understand why the following regexp doesn't match. Am I doing
something wrong?

(#/^\[?([^\]]*)\]?:(\d+)$/ "127.0.0.1:80") => #f

Looks ok on the other languages.

$ ruby -e 'p /^\[?([^\]]*)\]?:(\d+)$/ =~ "127.0.0.1:80"'
0

-- fuyuki

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Alex Shinn | 7 Oct 2003 05:47

Re: regexp oddity?

>>>>> Kimura Fuyuki <fuyuki <at> nigredo.org> writes:

    > I can't understand why the following regexp doesn't
    > match. Am I doing something wrong?

    > (#/^\[?([^\]]*)\]?:(\d+)$/ "127.0.0.1:80") => #f

Works without the \[? ... \]?

  (#/^([^\]]*):(\d+)$/ "127.0.0.1:80") => #<<regmatch> 0x8177f48>

and also with them non-optional

  (#/^\[([^\]]*)\]:(\d+)$/ "[127.0.0.1]:80") => #<<regmatch> 0x8177f60>

so you can split it into two cases as a temporary workaround.

Not sure if this is related:

  (#/[^a]*./ "b") => #f

--

-- 
Alex

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

Gmane