Uros | 2 Jul 2003 14:32

wildcard search

Hello!

Is there any good solution with wildcard searching with tsearch.

For example mini* will search every word begining with mini
maybe also mini?

Or there's only possible way to use LIKE

--

-- 
Best regards,
 Uros

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
Oleg Bartunov | 2 Jul 2003 14:51
Picon

Re: wildcard search

On Wed, 2 Jul 2003, Uros wrote:

> Hello!
>
> Is there any good solution with wildcard searching with tsearch.
>
> For example mini* will search every word begining with mini
> maybe also mini?

Prefix search isn't available in tsearch and I'm afraid
it'd have a high priority in our TODO.

>
> Or there's only possible way to use LIKE
>
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
(Continue reading)

Uros | 2 Jul 2003 14:57

Re[2]: wildcard search

Hello,

Wednesday, July 2, 2003, 2:51:59 PM, you wrote:

OB> On Wed, 2 Jul 2003, Uros wrote:

OB> Prefix search isn't available in tsearch and I'm afraid
OB> it'd have a high priority in our TODO.

great, I think that tsearch will someday be the most advanced searcher.
Is it possible to see TODO?

--

-- 
Best regards,
 Uros                            mailto:uros@...

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
Oleg Bartunov | 2 Jul 2003 15:01
Picon

Re[2]: wildcard search

On Wed, 2 Jul 2003, Uros wrote:

> Hello,
>
> Wednesday, July 2, 2003, 2:51:59 PM, you wrote:
>
> OB> On Wed, 2 Jul 2003, Uros wrote:
>
>
> OB> Prefix search isn't available in tsearch and I'm afraid
> OB> it'd have a high priority in our TODO.
>
> great, I think that tsearch will someday be the most advanced searcher.
> Is it possible to see TODO?

No. It's not written yet. In short we're thinking about more flexible
parser and dictionary support.

>
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
(Continue reading)

Denis Braekhus | 2 Jul 2003 16:01
Picon

Re: wildcard search / Norwegian challenge


On Wednesday 02 July 2003 14:57, Uros wrote:
> OB> Prefix search isn't available in tsearch and I'm afraid
> OB> it'd have a high priority in our TODO.
>
> great, I think that tsearch will someday be the most advanced searcher.
> Is it possible to see TODO?

I am not sure of this but the way Oleg wrote the sentence it seems to me he 
meant it does _not_ have a high priority on the TODO .. 

On a sidenote, I have an interesting problem with the norwegian language, and 
have come up with several possible solutions actually .. I thought of it when 
I saw this prefix search questions because that is one possible route ..
The problem is the grammatical rule to almost always concatenate words to make 
new words, unlike english. Example : 

English : arcade game, cellphone dealer, phone number
Norwegian : arkadespill, mobiltelefonforhandler, telefonnummer

What does this have to do with OpenFTS / Tsearch ? Well, the result is that in 
our Norwegian website we will both see : 

Many more uniquie words
Fewer hits than idea because a search for arkade will not match arkadespill 
(In English a search for arcade would match arcade game .. )

So I have been wondering what to do, how to split these words into smaller 
pieces. My thoughts so far :

(Continue reading)

Uros | 2 Jul 2003 16:20

Re[2]: wildcard search / Norwegian challenge

Hello,

Huh, I see that you thought a lot about this problem. I have same problems
with slovenian language. I tried with some dictionary parsing but i don't
like tsearch parser because when i convert words with to_query It puts
wrong operator. Let me explain this.

let say I search word "psi", this is dogs in english. So possible words if I
use dictionary parser is
pes, psa, psu, psov.... but when this is converted in

psi & pes & psa $ psu. So nothing is found, because this is options. So it
would be better if generat or in between possible variants of original word
from dictionary. Something like this

psi | (pes | psa | psu)

and also results with psi in it has to be ranked higher then those with
pes, psa, psu ....

I hope my explanation is clear enough. Maybe this is planed in TODO as
Teodor said "In short we're thinking about more flexible parser and
dictionary support."

I don't know if it's possible to have more wights (I'm speaking of A,B,C
and D) so document can be weighted more accurately. Let say I know title,
description, url, body, keywords so there's five of them. But this has
nothing to do with what was primary the problem here. I just mentioned.
I can put some ideas in some other mail and we can discus about this.

(Continue reading)

Oleg Bartunov | 2 Jul 2003 16:21
Picon

Re: wildcard search

On Wed, 2 Jul 2003, Brandon Craig Rhodes wrote:

> Oleg Bartunov <oleg@...> writes:
>
> >> For example mini* will search every word begining with mini
> >> maybe also mini?
> >
> > Prefix search isn't available in tsearch and I'm afraid
> > it'd have a high priority in our TODO.
>
> In English `high' priorities are good because they get done quickly;
> `low' priorities are bad because they are neglected.  Hence the
> positive response the user had to your response that this was a high
> priority, whereas I think you might have meant low. :-)
>

You're right. I meant prefix search has a low priority. :)

>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
(Continue reading)

Oleg Bartunov | 2 Jul 2003 16:32
Picon

Re: wildcard search / Norwegian challenge

On Wed, 2 Jul 2003, Denis Braekhus wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wednesday 02 July 2003 14:57, Uros wrote:
> > OB> Prefix search isn't available in tsearch and I'm afraid
> > OB> it'd have a high priority in our TODO.
> >
> > great, I think that tsearch will someday be the most advanced searcher.
> > Is it possible to see TODO?
>
> I am not sure of this but the way Oleg wrote the sentence it seems to me he
> meant it does _not_ have a high priority on the TODO ..
>
> On a sidenote, I have an interesting problem with the norwegian language, and
> have come up with several possible solutions actually .. I thought of it when
> I saw this prefix search questions because that is one possible route ..
> The problem is the grammatical rule to almost always concatenate words to make
> new words, unlike english. Example :
>
> English : arcade game, cellphone dealer, phone number
> Norwegian : arkadespill, mobiltelefonforhandler, telefonnummer
>
> What does this have to do with OpenFTS / Tsearch ? Well, the result is that in
> our Norwegian website we will both see :
>
> - - Many more uniquie words
> - - Fewer hits than idea because a search for arkade will not match arkadespill
> (In English a search for arcade would match arcade game .. )
(Continue reading)

Teodor Sigaev | 2 Jul 2003 16:44
Picon

Re: wildcard search / Norwegian challenge

> On a sidenote, I have an interesting problem with the norwegian language, and 
> have come up with several possible solutions actually .. I thought of it when 
> I saw this prefix search questions because that is one possible route ..
> The problem is the grammatical rule to almost always concatenate words to make 
> new words, unlike english. Example : 

IMHO, the only real way to write dictionary which splits word to several "basic" 
word and return them. Tsearch v2 allows to return several base from dictionary.

So the problem how do it. Look at http://folk.uio.no/runekl/dictionary.html, 
archive ispell-norsk-2.0.tar.gz. It contains file norsk.words.sq which has 
information about word and how it can be combined. So you must write some code
which can define correct part. Sorry, but ispell dictionary in tsearch v2 can't 
work correctly with it.

If you're interested in support of this feature, please write a private message 
to us (Teodor, Oleg)

--

-- 
Teodor Sigaev                                  E-mail: teodor@...

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
Denis Braekhus | 2 Jul 2003 16:45
Picon

Re: wildcard search / Norwegian challenge


On Wednesday 02 July 2003 16:32, Oleg Bartunov wrote:
> Did you try ispell dictionary ? It should recognize word stem. In your
> case dictionary should return several forms
> mobiltelefonforhandler -> mobil, telefon, handler
> tsearch supports dictionaries which return several stems.

I will try, and as this is the cleanest and best way I hope it will work. 
We have simply not had time to think too much about the search lately, but as 
I am doing this new project now, and my colleague is also working with a new 
search this is ever more interesting.

Thanks for your input, actually I was not aware this capability was already 
there. 

Regards
--

-- 
Denis Brækhus - ABC Startsiden AS
http://www.startsiden.no

"`In those days spirits were brave, the stakes were high, 
men were REAL men, women were REAL women, and small furry 
creatures from Alpha Centauri were REAL small furry 
creatures from Aplha Centauri.'" 
(Hitchhikers Guide To The Galaxy)

Gmane