Oleg Bartunov | 5 Dec 2003 13:10
Picon

OpenFTS-perl-0.35 released

OpenFTS development team is proud to announce release
of OpenFTS 0.35 perl version - open-source full text search engine
for PostgreSQL.

Download from http://sourceforge.net/project/showfiles.php?group_id=30968

Major changes:

 * use contrib/tsearch2
 * ispell dictionary supports compound words
   (sponsored by ABC Startsiden)
 * use ranking function from contrib/tsearch2

OpenFTS Web site - http://openfts.sourceforge.net/

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
yhafri | 6 Dec 2003 12:02
Picon

OpenFTS-general - Optimized Data Structure

Hi All,

After surfing a long time searching for a good data structures which is optimized for speed and space, I've
founded the "Judy Array Library" at http://judy.sourceforge.net/  .
This data structres is highly optimized dynamic array (implemented by 25 algorithms for the 32bits
systems and 85 for the 64bits...behind the scene) which can be use to create AVL, Binary trees, B-Tree,
Hashes ...
In all this cases, "Judy Array" outperform th classical structures both in speed access and in memory size.

Please, have a look to the :
"A 10 MINUTE TECHNICAL DESCRIPTION " at http://judy.sourceforge.net/downloads/10minutes.htm,

and "A 3 HOUR TECHNICAL DESCRIPTION " at http://judy.sourceforge.net/application/shop_interm.pdf,

It might be interesting to tests "Judy Array" and compare them to "tsearch2" structure.

Best Regards,
Younès

----Message d'origine----
>Date: Fri, 05 Dec 2003 20:04:20 -0800
>De: openfts-general-request <at> lists.sourceforge.net
>Sujet: OpenFTS-general digest, Vol 1 #183 - 1 msg
>A: openfts-general <at> lists.sourceforge.net
>
>Send OpenFTS-general mailing list submissions to
>	openfts-general <at> lists.sourceforge.net
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	https://lists.sourceforge.net/lists/listinfo/openfts-general
(Continue reading)

Fred Fung | 11 Dec 2003 16:09

Snowball French stemming

Good Day,
 
I don't know if I should ask this question here or at the Snowball mailing list, but I am going to try here first since it is more related to OpenFTS.
 
I am using OpenFTS 0.35 with the Snowball French stemming algorithm and the Snowball wrapper downloaded from the snowball.tartarus.org site. With this algorithm, the word "française" stemmed to become "français". This is perfect. However, when I stemmed the word "français", it became "franc".
 
I looked again at the Snowball site. With their example list of French vocabulary and its stemmed equivalent, "français" is indeed stemmed to become "franc".
 
But here is the problem : When I convert a piece of text containing the word "française" into its tsvector equivalent, and later search the table for the word "français", I would expect this text to be considered as a match as well. But obviously, it is not the case (and I have tried it) since, "français" will be stemmed to "franc" before the search starts, and will never match the stem "français" stored in the tsvector field.
 
Is this something one has to live with using this French stemming algorithm ? If not, is there any way to work around the problem I mentioned here ?
 
Thanks.
 
 
Fred 
Oleg Bartunov | 11 Dec 2003 16:30
Picon

Re: Snowball French stemming

Fred,

I see your problem. This is mostly snowball question, so I'd recommend ask
Martin Porter. As a workaround I suggest you create simple dictionary and
let him recognize your problem word (s) before snowball stemmer.

	Oleg

On Thu, 11 Dec 2003, Fred Fung wrote:

> Good Day,
>
> I don't know if I should ask this question here or at the Snowball mailing list, but I am going to try here
first since it is more related to OpenFTS.
>
> I am using OpenFTS 0.35 with the Snowball French stemming algorithm and the Snowball wrapper downloaded
from the snowball.tartarus.org site. With this algorithm, the word "franГaise" stemmed to become
"franГais". This is perfect. However, when I stemmed the word "franГais", it became "franc".
>
> I looked again at the Snowball site. With their example list of French vocabulary and its stemmed
equivalent, "franГais" is indeed stemmed to become "franc".
>
> But here is the problem : When I convert a piece of text containing the word "franГaise" into its tsvector
equivalent, and later search the table for the word "franГais", I would expect this text to be considered
as a match as well. But obviously, it is not the case (and I have tried it) since, "franГais" will be stemmed
to "franc" before the search starts, and will never match the stem "franГais" stored in the tsvector field.
>
> Is this something one has to live with using this French stemming algorithm ? If not, is there any way to work
around the problem I mentioned here ?
>
> Thanks.
>
>
> Fred

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Fred Fung | 11 Dec 2003 16:42

Re: Snowball French stemming

Thanks Oleg. I will email to the Snowball mailing list.

Fred

----- Original Message -----
From: "Oleg Bartunov" <oleg@...>
To: "Fred Fung" <fred.fung@...>
Cc: <openfts-general@...>
Sent: Thursday, December 11, 2003 10:30 AM
Subject: Re: [OpenFTS-general] Snowball French stemming

Fred,

I see your problem. This is mostly snowball question, so I'd recommend ask
Martin Porter. As a workaround I suggest you create simple dictionary and
let him recognize your problem word (s) before snowball stemmer.

Oleg

On Thu, 11 Dec 2003, Fred Fung wrote:

> Good Day,
>
> I don't know if I should ask this question here or at the Snowball mailing
list, but I am going to try here first since it is more related to OpenFTS.
>
> I am using OpenFTS 0.35 with the Snowball French stemming algorithm and
the Snowball wrapper downloaded from the snowball.tartarus.org site. With
this algorithm, the word "franГaise" stemmed to become "franГais". This is
perfect. However, when I stemmed the word "franГais", it became "franc".
>
> I looked again at the Snowball site. With their example list of French
vocabulary and its stemmed equivalent, "franГais" is indeed stemmed to
become "franc".
>
> But here is the problem : When I convert a piece of text containing the
word "franГaise" into its tsvector equivalent, and later search the table
for the word "franГais", I would expect this text to be considered as a
match as well. But obviously, it is not the case (and I have tried it)
since, "franГais" will be stemmed to "franc" before the search starts, and
will never match the stem "franГais" stored in the tsvector field.
>
> Is this something one has to live with using this French stemming
algorithm ? If not, is there any way to work around the problem I mentioned
here ?
>
> Thanks.
>
>
> Fred

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
OpenFTS-general mailing list
OpenFTS-general@...
https://lists.sourceforge.net/lists/listinfo/openfts-general

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Teodor Sigaev | 11 Dec 2003 16:48
Picon

Re: Snowball French stemming

You can solve the problem with widespread words by ispell dictionary, look at 
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Fred Fung wrote:
> Good Day,
>  
> I don't know if I should ask this question here or at the Snowball 
> mailing list, but I am going to try here first since it is more related 
> to OpenFTS.
>  
> I am using OpenFTS 0.35 with the Snowball French stemming algorithm and 
> the Snowball wrapper downloaded from the snowball.tartarus.org site. 
> With this algorithm, the word "française" stemmed to become "français". 
> This is perfect. However, when I stemmed the word "français", it became 
> "franc".
>  
> I looked again at the Snowball site. With their example list of French 
> vocabulary and its stemmed equivalent, "français" is indeed stemmed to 
> become "franc".
>  
> But here is the problem : When I convert a piece of text containing the 
> word "française" into its tsvector equivalent, and later search the 
> table for the word "français", I would expect this text to be considered 
> as a match as well. But obviously, it is not the case (and I have tried 
> it) since, "français" will be stemmed to "franc" before the search 
> starts, and will never match the stem "français" stored in the tsvector 
> field.
>  
> Is this something one has to live with using this French stemming 
> algorithm ? If not, is there any way to work around the problem I 
> mentioned here ?
>  
> Thanks.
>  
>  
> Fred 

--

-- 
Teodor Sigaev                                  E-mail: teodor@...

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Fred Fung | 11 Dec 2003 17:38

Re: Snowball French stemming

Is it as simple as just setting one of the 'dict' entries in fts_conf to
point to Search::OpenFTS::Morph::ISpell ?

And where should 'french.dict' and 'french.aff' be installed to ?

Thanks.

Fred

----- Original Message -----
From: "Teodor Sigaev" <teodor@...>
To: "Fred Fung" <fred.fung@...>
Cc: <openfts-general@...>
Sent: Thursday, December 11, 2003 10:48 AM
Subject: Re: [OpenFTS-general] Snowball French stemming

You can solve the problem with widespread words by ispell dictionary, look
at
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Fred Fung wrote:
> Good Day,
>
> I don't know if I should ask this question here or at the Snowball
> mailing list, but I am going to try here first since it is more related
> to OpenFTS.
>
> I am using OpenFTS 0.35 with the Snowball French stemming algorithm and
> the Snowball wrapper downloaded from the snowball.tartarus.org site.
> With this algorithm, the word "française" stemmed to become "français".
> This is perfect. However, when I stemmed the word "français", it became
> "franc".
>
> I looked again at the Snowball site. With their example list of French
> vocabulary and its stemmed equivalent, "français" is indeed stemmed to
> become "franc".
>
> But here is the problem : When I convert a piece of text containing the
> word "française" into its tsvector equivalent, and later search the
> table for the word "français", I would expect this text to be considered
> as a match as well. But obviously, it is not the case (and I have tried
> it) since, "français" will be stemmed to "franc" before the search
> starts, and will never match the stem "français" stored in the tsvector
> field.
>
> Is this something one has to live with using this French stemming
> algorithm ? If not, is there any way to work around the problem I
> mentioned here ?
>
> Thanks.
>
>
> Fred

--
Teodor Sigaev                                  E-mail: teodor@...

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Teodor Sigaev | 11 Dec 2003 18:00
Picon

Re: Snowball French stemming


Fred Fung wrote:
> Is it as simple as just setting one of the 'dict' entries in fts_conf to
> point to Search::OpenFTS::Morph::ISpell ?

As the simplest way yes, but itsn't optimum (BTW, ISpell dictionary should be 
first).

Good way is something like this:

     my $idx = Search::OpenFTS::Index->init(
         dbi             => $dbi,
         txttid          => 'texts.tid',
         tsvector_field  => 'fts_index',
         ignore_id_index => [qw( 7 13 14 12 23 )],
         ignore_headline => [qw(13 15 16 17 5)],
         map             => '{
            \'1\'=>[0,1],    #LATWORD
              2=>[0,1],    #CYRWORD
              3=>[0,1],    #UWORD
              4=>[2],    #EMAIL
              5=>[2],    #FURL
              6=>[2],    #HOST
              7=>[2],    #SCIENTIFIC
              8=>[2],    #VERSIONNUMBER
              9=>[0,1],    #PARTHYPHENWORD
              10=>[0,1],    #CYRPARTHYPHENWORD
              11=>[0,1],    #LATPARTHYPHENWORD
              15=>[0,1],    #HYPHENWORD
              16=>[0,1],    #LATHYPHENWORD
              17=>[0,1],    #CYRHYPHENWORD
              18=>[2],    #URI
              19=>[2],    #FILEPATH
              20=>[2],    #DECIMAL
              21=>[2],    #SIGNEDINT
              22=>[2],    #UNSIGNEDINT
         }',
         dict => [
             {
                 mod=>'Search::OpenFTS::Morph::ISpell',
                 param=>'{
                   dict_file=>"/foo/french.dict",
                   aff_file=>"/foo/french.aff",
                   stop_file=>"/foo/french.stop"
                 }'
             },
             {
                 mod=>'Search::OpenFTS::Dict::Snowball',
                 param=>'{
                   lang=>"french",
                   stop_file=>"/foo/french.stop"
                 }'
             },
             {
                 mod   => 'Search::OpenFTS::Dict::UnknownDict',
                 param => "{}"
             },
         ]
     );

More about configuration you can read in doc/primer.html

> And where should 'french.dict' and 'french.aff' be installed to ?
Anywhere on your disk.

--

-- 
Teodor Sigaev                                  E-mail: teodor@...

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Fred Fung | 11 Dec 2003 18:14

Re: Snowball French stemming

Thanks Teodor.

Fred

----- Original Message -----
From: "Teodor Sigaev" <teodor@...>
To: "Fred Fung" <fred.fung@...>
Cc: <openfts-general@...>
Sent: Thursday, December 11, 2003 12:00 PM
Subject: Re: [OpenFTS-general] Snowball French stemming

>
>
> Fred Fung wrote:
> > Is it as simple as just setting one of the 'dict' entries in fts_conf to
> > point to Search::OpenFTS::Morph::ISpell ?
>
> As the simplest way yes, but itsn't optimum (BTW, ISpell dictionary should
be
> first).
>
> Good way is something like this:
>
>      my $idx = Search::OpenFTS::Index->init(
>          dbi             => $dbi,
>          txttid          => 'texts.tid',
>          tsvector_field  => 'fts_index',
>          ignore_id_index => [qw( 7 13 14 12 23 )],
>          ignore_headline => [qw(13 15 16 17 5)],
>          map             => '{
>             \'1\'=>[0,1],    #LATWORD
>               2=>[0,1],    #CYRWORD
>               3=>[0,1],    #UWORD
>               4=>[2],    #EMAIL
>               5=>[2],    #FURL
>               6=>[2],    #HOST
>               7=>[2],    #SCIENTIFIC
>               8=>[2],    #VERSIONNUMBER
>               9=>[0,1],    #PARTHYPHENWORD
>               10=>[0,1],    #CYRPARTHYPHENWORD
>               11=>[0,1],    #LATPARTHYPHENWORD
>               15=>[0,1],    #HYPHENWORD
>               16=>[0,1],    #LATHYPHENWORD
>               17=>[0,1],    #CYRHYPHENWORD
>               18=>[2],    #URI
>               19=>[2],    #FILEPATH
>               20=>[2],    #DECIMAL
>               21=>[2],    #SIGNEDINT
>               22=>[2],    #UNSIGNEDINT
>          }',
>          dict => [
>              {
>                  mod=>'Search::OpenFTS::Morph::ISpell',
>                  param=>'{
>                    dict_file=>"/foo/french.dict",
>                    aff_file=>"/foo/french.aff",
>                    stop_file=>"/foo/french.stop"
>                  }'
>              },
>              {
>                  mod=>'Search::OpenFTS::Dict::Snowball',
>                  param=>'{
>                    lang=>"french",
>                    stop_file=>"/foo/french.stop"
>                  }'
>              },
>              {
>                  mod   => 'Search::OpenFTS::Dict::UnknownDict',
>                  param => "{}"
>              },
>          ]
>      );
>
> More about configuration you can read in doc/primer.html
>
> > And where should 'french.dict' and 'french.aff' be installed to ?
> Anywhere on your disk.
>
> --
> Teodor Sigaev                                  E-mail: teodor@...
>

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Janine Sisk | 12 Dec 2003 16:10

Trouble compiling on RH Enterprise

Hello,

I'm attempting to build Search-OpenFTS-tcl-0.3.2 on RedHat Enterprise 
Server (with pg 7.2.4), for use with OpenACS.  Following the directions 
posted several places "out there" I have edited Makefile.global to 
specify where to find tcl.h, but I am getting (apparently bogus) syntax 
errors.

If I use the tcl.h from Tcl 8.3.5 I get

make[1]: Entering directory 
`/usr/local/src/Search-OpenFTS-tcl-0.3.2/parser'
gcc -c -I. -fPIC -I../include -I/usr/local/src/tcl8.3.5/generic 
-DPACKAGE=\"OPENFTS\" -DVERSION=\"0.3.2\" -DHAVE_UNISTD_H=1 
-DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
-DHAVE_UNISTD_H=1 -DSTDC_HEADERS=1 -DHAVE_STRERROR=1 -DHAVE_STRSTR=1 
-DHAVE_STRLEN=1 -DHAVE_POLL=1 Parser.c -o Parser.o
In file included from ../include/iispell.h:5,
                from ../include/fts.h:6,
                from Parser.c:28:
/usr/local/src/tcl8.3.5/generic/regex.h:145: syntax error before 
"re_void"
/usr/local/src/tcl8.3.5/generic/regex.h:145: warning: data definition 
has no type or storage class
/usr/local/src/tcl8.3.5/generic/regex.h:314: syntax error before 
"_ANSI_ARGS_"
/usr/local/src/tcl8.3.5/generic/regex.h:323: syntax error before 
"_ANSI_ARGS_"
/usr/local/src/tcl8.3.5/generic/regex.h:326: syntax error before 
"_ANSI_ARGS_"

and if I use the one in aolserver (3.3+ad13) it's

make[1]: Entering directory 
`/usr/local/src/Search-OpenFTS-tcl-0.3.2/parser'
gcc -c -I. -fPIC -I../include 
-I/usr/local/src/aolserver/aolserver/include -DPACKAGE=\"OPENFTS\" 
-DVERSION=\"0.3.2\" -DHAVE_UNISTD_H=1 -DSTDC_HEADERS=1 
-DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 
-DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DSTDC_HEADERS=1 -DHAVE_STRERROR=1 -DHAVE_STRSTR=1 -DHAVE_STRLEN=1 
-DHAVE_POLL=1 Parser.c -o Parser.o
Parser.c: In function `Fts_GetDescriptObjCmd':
Parser.c:41: syntax error before "Tcl_Obj"
Parser.c:49: subscripted value is neither array nor pointer
Parser.c: In function `Fts_GetLexObjCmd':
Parser.c:65: syntax error before "Tcl_Obj"
(it goes on for many more lines like this)

Any ideas what I might be doing wrong?  I should mention that the same 
source builds just fine on a RedHat 8 system.

thanks,

janine

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/

Gmane