Amit | 12 Dec 09:09
Jerrie Union | 5 Apr 23:34
Picon

Searching from a Lucene created Index

Hi,

I'm trying to open a Index created by Lucene 1.9.1. I know the index has not been
corrupted, I can browse it with Luke but when I try:
irb> Ferret::Index::IndexReader.new("./index'")

I get the following error message:
Ferret::FileNotFoundError: File Not Found Error occured at <except.c>:93 in xraise
Error occured in index.c:840 - sis_find_segments_file
	couldn't find segments file

	from (irb):7:in `initialize'
	from (irb):7:in `new'
	from (irb):7

the segments file is inside ./index but something get wrong.

Help is appreciated :)

Ferret::VERSION = 0.11.6

Thanks!
Andrew S. Townley | 23 Feb 14:57
Favicon
Gravatar

Custom highlighter/match vector access?

Hi everyone,

I know from the archives things have kinda slowed down on ferret and there's an effort ongoing with lucy, but
I was wondering if anyone had discovered a way to enumerate the matches of a particular field in the
document and get the offsets?

With what I'm trying to do, ferret will be indexing large portions of structured information, but I really
don't want to store it all in the ferret index just to have highlighting.  My understanding (I'm still new at
this) is that if you index and store the match offsets, you can do this without storing the full text of the field.

Ideally, what I'd like is to expose  the contents of the C MatchRange structure as an array of Ruby hash
objects so that I could then use those offsets in the actual data store to create my own highlighted
extracts (or something along those lines).

Short of adding a hacked version of searcher_highlight to the C API to do this and creating a corresponding
wrapped Ruby version, is there any way to get to this information right now from the Ruby API?

Alternatively, is there another/better way to do this besides storing the whole field values and using the
built-in highlighter?

Any advice or pointers would be really appreciated.

Cheers,

ast
--
Andrew S. Townley <ast@...>
http://atownley.org
Peter Karman | 21 Nov 18:23
Favicon
Gravatar

Apache Lucy invitation

[[cross-posted from the ruby-forum list]]

Hi.

Back in 2006[0], Dave Balmain and Marvin Humphrey agreed to join forces
on their search projects, Ferret and KinoSearch (respectively), and
created the Lucy project at Apache.

Now it's 2010. Lucy is in the Apache Incubator.[1]

I'm a part of the Apache Lucy project, and I'd like to invite you to
become a part of it too.

The main goal of Apache Lucy is to provide the core C code for
language-specific implementations, like Ferret does for Ruby. Now's your
chance to help define what Apache Lucy looks like for Ruby.

Mailing list information at the Incubator site[1].

cheers,
Peter Karman

[0] http://www.perlmonks.org/?node_id=556317
[1] http://incubator.apache.org/lucy/

--

-- 
Peter Karman  .  http://peknet.com/  .  peter@...
Zakay Danial | 21 May 14:10
Picon

Ferret search engine as a daemon?

Hello, I just found out about Ferret. I am searching for good/capable/fast search engine, but I also would want to talk to the search engine using TCP.

Is there a way to run the search engine as a daemon so that it handle the request over sockets?

Kind regards
Zach
_______________________________________________
Ferret-talk mailing list
Ferret-talk@...
http://rubyforge.org/mailman/listinfo/ferret-talk
Max Williams | 8 Dec 15:54
Picon
Gravatar

Can i get a list of all unique indexed words?

I have a requirement to provide a book-style browseable 'index' of all
our resources (which are already indexed with ferret).  I thought that a
nice simple way to do this would be to pull every unique indexed word
from the ferret index, so that when the user clicks on a word in the
index, i just do a regular ferret search using that word.

With this approach, the only work i need to do is to generate the list
of terms in the first place (and refresh it occasionally).  Is there a
way to pull this out of the ferret index somehow?  It doesn't have to
happen in real time, i could do it in a cron job and save the results to
a text file, or whatever.  So, i don't mind if it's a slow process.

Grateful for any advice - max
--

-- 
Posted via http://www.ruby-forum.com/.
Max Williams | 26 Nov 12:13
Picon
Gravatar

Problem with case sensitivity

I'm using a custom stem analyser in my searches and my indexing.  The analyser is defined thus:

module Ferret::Analysis
  class StemmingAnalyzer
    def token_stream(field, text)
      text.downcase!
      RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text = #{text.inspect}"
      tokenizer = StandardTokenizer.new(text)
      filter = StemFilter.new(tokenizer)
      filter
    end
  end
end

I use it in my indexing like this:

  acts_as_ferret({ :store_class_name => true,
                   :ferret => { :analyzer => Ferret::Analysis::StemmingAnalyzer.new },
                   :fields => {:property_names =>  { :boost => 3.0 },
                               ....etc
                   }})

And in a search like this:

search_class.find_ids_with_ferret(search_term, {:limit => 10000, :analyzer => Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score|
      r_id = r_id.to_i
      ferret_ids << r_id
      self.scores_hash[r_id] = score
end

I have a problem with case sensitivity - basically, searches only work when they are lowercase: even when it looks like the text stored in the index is uppercase.  From the console -

>> resource.to_doc
=> {:resource_id=>"59", :property_names=>"Bb Clarinet Clarinet Family Woodwind Instrumental and Vocal Image Resources Types" }
>> TeachingObject.find_with_ferret("Vocal", :page => 1, :per_page => 1000).include?(resource)
=> false
>> TeachingObject.find_with_ferret("vocal", :page => 1, :per_page => 1000).include?(resource)
=> true

 I think i have my stemming set up wrong, i'm not sure if it is even being used.  I implemented it so that searches allowed pluralised and singular terms, and that seems to work, eg

>> TeachingObject.find_with_ferret("vocals", :page => 1, :per_page => 1000).include?(resource)
=> true

But the case sensitivity thing has me stumped.  I thought that the downcase! call on the search term would make case irrelevant for searching but that seems not to be the case.  Can anyone set me straight?

_______________________________________________
Ferret-talk mailing list
Ferret-talk@...
http://rubyforge.org/mailman/listinfo/ferret-talk
Santoshkumar Patil | 13 Nov 20:51
Picon

Invitation to connect on LinkedIn

LinkedIn

I'd like to add you to my professional network on LinkedIn.

- Santoshkumar

Confirm that you know Santoshkumar Patil

Every day, millions of professionals like Santoshkumar Patil use LinkedIn to connect with colleagues, find experts, and explore opportunities.

© 2009, LinkedIn Corporation

_______________________________________________
Ferret-talk mailing list
Ferret-talk@...
http://rubyforge.org/mailman/listinfo/ferret-talk
femto Zheng | 18 Aug 09:30
Picon

How do I index very large file?

Hello all, I'm doing a monitor application,
which fetches log file of application and indexing it,
how do I index very large file?,
like up to serveral GB. because the application logs
may log very large file in short time.
femto Zheng | 18 Aug 08:27
Picon

Can't remove duplicate

Hello all, I can't remove duplicate,I'm using ferret
to index log file in order to monitor application activity,
what I want to do is index data based on the uniqueness of
[filename,line](actullay should be [host,filename,line],
the code is following:

if !$indexer

      field_infos = Ferret::Index::FieldInfos.new(:index =>
:untokenized_omit_norms,
                                   :term_vector => :no)
      field_infos.add_field(:content, :store => :yes, :index => :yes)

      $indexer = Ferret::I.new(:path => index_dir,
                               :field_infos => field_infos,
                               :key => [:filename, :line],
                               :max_buffered_docs=>100)

      #$indexer ||= Ferret::I.new(:path=>index_dir, :key => ['filename',
'line'], :max_buffered_docs=>100) #unique host,file_name,line
      #$indexer.field_infos.add_field(:time,
      #                               #:default_boost => 20,
      #                               :store => :yes,
      #                               :index => :untokenized,
      #                               :term_vector => :no)
    end
but the problem is, I will index a new datum even if the [filename,line]
is same, even I change :key => ["filename", "line"], it also doesn't
work,
what's the problem? Thanks.
Yeung William | 18 Aug 05:41

Ferret Usability

Guys,

I am new to Ferret- I have mixed feeling about this thing. On one side  
I really like the simplicity of the system- its easy to deploy and  
used, and I have a lot of choices on integration from aaf or doing my  
own isn't too hard too. On the other hand, I heard a lot of horrible  
stories from index corruption to segfaults. The most classical thread  
I can find is here:

http://groups.google.com/group/rubyonrails-deployment/browse_thread/thread/980fe7cb20cb97dd

Even Ezra <at> EY is basically saying Ferret is unusable. May I know how's  
the situation now? Anyone can nail down what actually had happened on  
their segfaults/index corruption?

Gmane