Dave Killian | 9 Jun 01:18 2005
Picon

Automatically indexing rows

Hello,
I would like to have the tsvector field updated automatically in my database upon insertion of a new row rather than calling the OpenFTS index function.  Would the trigger outlined in the tsearch2 introduction do the job?  I saw a related thread advising the use of this trigger, but I would like to know what differences there would be between using that trigger and using the OpenFTS index function?  If they would be different, is it possible to create a trigger that would call the OpenFTS index function?

I am new to all of this, so thank you for your patience and help.

Dave

Oleg Bartunov | 9 Jun 04:25 2005
Picon

Re: Automatically indexing rows

On Wed, 8 Jun 2005, Dave Killian wrote:

> Hello,
> I would like to have the tsvector field updated automatically in my database
> upon insertion of a new row rather than calling the OpenFTS index function.
> Would the trigger outlined in the tsearch2 introduction do the job? I saw a
> related thread advising the use of this trigger, but I would like to know
> what differences there would be between using that trigger and using the
> OpenFTS index function? If they would be different, is it possible to create

tsearch2 and openfts use *different* configurations !

> a trigger that would call the OpenFTS index function?
>

I think it's possible with pl/perl.

> I am new to all of this, so thank you for your patience and help.
>
> Dave
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
hubert depesz lubaczewski | 11 Jun 15:35 2005
Picon

tsearch2/openfts and a database with texts in different languages?

hi
i guess this should be a simple question:
is it possible to use tsearch2 alone or openfts in a database where i
have texts in
different languages?
i.e. it is a database of technical texts written in at least 4
different languages (english, polish, russian and german).
probably more languages will be comming.
the question is - is it possible to setup tsearch2/openfts so they
will allow me to search efficiently over this database.
i dont need stemmers (which might be a big point in this case), but
having them would be nice.

depesz

-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r 
Markus Bertheau | 11 Jun 18:19 2005
Picon

Re: tsearch2/openfts and a database with texts in different languages?

You need to know which language the text is in, so you can tell tsearch
which stemmer to use. Other than that that seems possible.

What are you doing? It sounds interesting.

Markus

Dnia 11-06-2005, sob o godzinie 15:35 +0200, hubert depesz lubaczewski
napisał(a):
> hi
> i guess this should be a simple question:
> is it possible to use tsearch2 alone or openfts in a database where i
> have texts in
> different languages?
> i.e. it is a database of technical texts written in at least 4
> different languages (english, polish, russian and german).
> probably more languages will be comming.
> the question is - is it possible to setup tsearch2/openfts so they
> will allow me to search efficiently over this database.
> i dont need stemmers (which might be a big point in this case), but
> having them would be nice.
> 
> depesz
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
> a projector? How fast can you ride your desk chair down the office luge track?
> If you want to score the big prize, get to know the little guy.  
> Play to win an NEC 61" plasma display: http://www.necitguy.com/?r 
> _______________________________________________
> OpenFTS-general mailing list
> OpenFTS-general@...
> https://lists.sourceforge.net/lists/listinfo/openfts-general
--

-- 
Markus Bertheau <twanger@...>
Oleg Bartunov | 11 Jun 20:56 2005
Picon

Re: tsearch2/openfts and a database with texts in different languages?

On Sat, 11 Jun 2005, hubert depesz lubaczewski wrote:

> hi
> i guess this should be a simple question:
> is it possible to use tsearch2 alone or openfts in a database where i
> have texts in
> different languages?
> i.e. it is a database of technical texts written in at least 4
> different languages (english, polish, russian and german).
> probably more languages will be comming.

why not ? I suppose you understand how tsearch2  uses dictionaries.
Just keep in mind stemmer dictionary recognizes everything,
so it should be the last dictionary if any.

We have plan to add full unicode support and, probably, language
recognition.

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
mike | 21 Jun 23:54 2005

snb_lexizehecks stopwords before stemming?


Hi,

We're searching an email archive using tsearch2 with PostgreSQL 8.0.3.
I'm tuning the index now for performance, and am marking some of our
most common words as stopwords. I found that snb_lexize() checks the
stopword list before stemming the word, and thus some stopwords are
indexed in their non-root forms; e.g. "use" is a stopword, but "using"
is indexed.

Is this intentional?

Another question - I've used the stat() function to analyze our index,
and we have about a million unique words. A lot of these are just
garbage, so I'll be working to weed them out, but is it more important
to cut down the number of unique words, or to reduce the most frequent
words?

Thanks,

--

-- 
  | Mike Acar |                                | mike at waspfactory dot org |

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
Oleg Bartunov | 22 Jun 08:56 2005
Picon

Re: snb_lexizehecks stopwords before stemming?

On Tue, 21 Jun 2005, mike-ofts@... wrote:

>
> Hi,
>
> We're searching an email archive using tsearch2 with PostgreSQL 8.0.3.
> I'm tuning the index now for performance, and am marking some of our
> most common words as stopwords. I found that snb_lexize() checks the
> stopword list before stemming the word, and thus some stopwords are
> indexed in their non-root forms; e.g. "use" is a stopword, but "using"
> is indexed.
>
> Is this intentional?

yes, read "How tsearch2 dictionaries work with stop words" section of my
notes http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes

>
> Another question - I've used the stat() function to analyze our index,
> and we have about a million unique words. A lot of these are just
> garbage, so I'll be working to weed them out, but is it more important
> to cut down the number of unique words, or to reduce the most frequent
> words?
>

read "Do you need to index them ?" of my notes and 
"Tsearch2 internals" http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
Both are important.

> Thanks,
>
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
Uroš Gruber | 28 Jun 09:54 2005
Picon

very slow search

Hi!

I need some help to solve problem with slow searching. I have many 
tables with ts index but only one makes troubles. This is table description

--------------+-----------------------------+-----------------------
  id_news      | integer                     | not null
  title        | character varying(255)      | not null
  flash        | character varying(1024)     | not null
  link         | character varying(128)      |
  id_category  | integer                     | not null
  id_publisher | character varying           | not null
  picture      | character varying(64)       |
  published    | timestamp without time zone | not null
  is_visible   | boolean                     | not null default true
  created      | timestamp without time zone |
  body         | text                        |
  tsidx        | tsvector                    |
Indeksi:
     "news_item_pkey" PRIMARY KEY, btree (id_news)
     "id_publisher_idx" btree (id_publisher)
     "is_visible_idx" btree (is_visible)
     "tsidx_idx" gist (tsidx)

and here is query explain

  explain analyze SELECT id_news, title, flash, published, picture FROM 
news_item WHERE tsidx  <at>  <at>  to_tsquery('simple','janez');
                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
  Index Scan using tsidx_idx on news_item  (cost=0.00..97.99 rows=9 
width=400) (actual time=4.503..849.755 rows=84 loops=1)
    Index Cond: (tsidx  <at>  <at>  '\'janez\''::tsquery)
  Total runtime: 850.065 ms

select count(*) from news_item;
  count
-------
   1933

I don't know why searching takes so much time. Because other tables has 
more than 60.000 rows and searching is done in 10 or 20 ms.

What can I check or change to make this work faster. I speed up a little 
with vacuum full, because there vas a lot of deleting. I use postgresql 
8.0.3 and tsearch2 in contrib

regards

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
Oleg Bartunov | 29 Jun 13:53 2005
Picon

Re: very slow search

On Tue, 28 Jun 2005, Uro Gruber wrote:

> Hi!
>
> I need some help to solve problem with slow searching. I have many tables 
> with ts index but only one makes troubles. This is table description
>
> --------------+-----------------------------+-----------------------
> id_news      | integer                     | not null
> title        | character varying(255)      | not null
> flash        | character varying(1024)     | not null
> link         | character varying(128)      |
> id_category  | integer                     | not null
> id_publisher | character varying           | not null
> picture      | character varying(64)       |
> published    | timestamp without time zone | not null
> is_visible   | boolean                     | not null default true
> created      | timestamp without time zone |
> body         | text                        |
> tsidx        | tsvector                    |
> Indeksi:
>    "news_item_pkey" PRIMARY KEY, btree (id_news)
>    "id_publisher_idx" btree (id_publisher)
>    "is_visible_idx" btree (is_visible)
>    "tsidx_idx" gist (tsidx)
>
> and here is query explain
>
> explain analyze SELECT id_news, title, flash, published, picture FROM 
> news_item WHERE tsidx  <at>  <at>  to_tsquery('simple','janez');
>                                                        QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------
> Index Scan using tsidx_idx on news_item  (cost=0.00..97.99 rows=9 width=400) 
> (actual time=4.503..849.755 rows=84 loops=1)
>   Index Cond: (tsidx  <at>  <at>  '\'janez\''::tsquery)
> Total runtime: 850.065 ms
>
>
> select count(*) from news_item;
> count
> -------
>  1933
>
> I don't know why searching takes so much time. Because other tables has more 
> than 60.000 rows and searching is done in 10 or 20 ms.
>
> What can I check or change to make this work faster. I speed up a little with 
> vacuum full, because there vas a lot of deleting. I use postgresql 8.0.3 and 
> tsearch2 in contrib

Uro, what if you repeat your query several times ?  Are you sure you created
tsidx column using 'simple' dictionary ?

>
> regards
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> _______________________________________________
> OpenFTS-general mailing list
> OpenFTS-general@...
> https://lists.sourceforge.net/lists/listinfo/openfts-general
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@..., http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
Uroš Gruber | 29 Jun 14:00 2005
Picon

Re: very slow search

Oleg Bartunov said the following on 29.6.2005 13:53:
> On Tue, 28 Jun 2005, Uro Gruber wrote:
>> I need some help to solve problem with slow searching. I have many 
>>
>> What can I check or change to make this work faster. I speed up a 
>> little with vacuum full, because there vas a lot of deleting. I use 
>> postgresql 8.0.3 and tsearch2 in contrib
> 
> 
> Uro, what if you repeat your query several times ?  Are you sure you 
> created
> tsidx column using 'simple' dictionary ?
> 

It does not metter how many times I execute. Always the same. About 
simple I'm sure, here is the function I use

CREATE OR REPLACE FUNCTION update_idx() RETURNS trigger AS $update_idx$
     BEGIN
         NEW.tsidx := to_tsvector('simple',coalesce(NEW.title,'') ||' 
'|| coalesce(NEW.flash,'')||' '|| coalesce(NEW.body,''));
	RAISE NOTICE '%', NEW.tsidx;
         RETURN NEW;
     END;
$update_idx$ LANGUAGE plpgsql;

regards

Uros

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

Gmane