Searching from a Lucene created Index
Hi,
I'm trying to open a Index created by Lucene 1.9.1. I know the index has not been
corrupted, I can browse it with Luke but when I try:
irb> Ferret::Index::IndexReader.new("./index'")
I get the following error message:
Ferret::FileNotFoundError: File Not Found Error occured at <except.c>:93 in xraise
Error occured in index.c:840 - sis_find_segments_file
couldn't find segments file
from (irb):7:in `initialize'
from (irb):7:in `new'
from (irb):7
the segments file is inside ./index but something get wrong.
Help is appreciated :)
Ferret::VERSION = 0.11.6
Thanks!
Custom highlighter/match vector access?
Hi everyone, I know from the archives things have kinda slowed down on ferret and there's an effort ongoing with lucy, but I was wondering if anyone had discovered a way to enumerate the matches of a particular field in the document and get the offsets? With what I'm trying to do, ferret will be indexing large portions of structured information, but I really don't want to store it all in the ferret index just to have highlighting. My understanding (I'm still new at this) is that if you index and store the match offsets, you can do this without storing the full text of the field. Ideally, what I'd like is to expose the contents of the C MatchRange structure as an array of Ruby hash objects so that I could then use those offsets in the actual data store to create my own highlighted extracts (or something along those lines). Short of adding a hacked version of searcher_highlight to the C API to do this and creating a corresponding wrapped Ruby version, is there any way to get to this information right now from the Ruby API? Alternatively, is there another/better way to do this besides storing the whole field values and using the built-in highlighter? Any advice or pointers would be really appreciated. Cheers, ast -- Andrew S. Townley <ast@...> http://atownley.org
Apache Lucy invitation
[[cross-posted from the ruby-forum list]] Hi. Back in 2006[0], Dave Balmain and Marvin Humphrey agreed to join forces on their search projects, Ferret and KinoSearch (respectively), and created the Lucy project at Apache. Now it's 2010. Lucy is in the Apache Incubator.[1] I'm a part of the Apache Lucy project, and I'd like to invite you to become a part of it too. The main goal of Apache Lucy is to provide the core C code for language-specific implementations, like Ferret does for Ruby. Now's your chance to help define what Apache Lucy looks like for Ruby. Mailing list information at the Incubator site[1]. cheers, Peter Karman [0] http://www.perlmonks.org/?node_id=556317 [1] http://incubator.apache.org/lucy/ -- -- Peter Karman . http://peknet.com/ . peter@...
Ferret search engine as a daemon?
Hello, I just found out about Ferret. I am searching for good/capable/fast search engine, but I also would want to talk to the search engine using TCP.
_______________________________________________ Ferret-talk mailing list Ferret-talk@... http://rubyforge.org/mailman/listinfo/ferret-talk
Can i get a list of all unique indexed words?
I have a requirement to provide a book-style browseable 'index' of all our resources (which are already indexed with ferret). I thought that a nice simple way to do this would be to pull every unique indexed word from the ferret index, so that when the user clicks on a word in the index, i just do a regular ferret search using that word. With this approach, the only work i need to do is to generate the list of terms in the first place (and refresh it occasionally). Is there a way to pull this out of the ferret index somehow? It doesn't have to happen in real time, i could do it in a cron job and save the results to a text file, or whatever. So, i don't mind if it's a slow process. Grateful for any advice - max -- -- Posted via http://www.ruby-forum.com/.
Problem with case sensitivity
I'm using a custom stem analyser in my searches and my indexing. The analyser is defined thus:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
text.downcase!
RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text = #{text.inspect}"
tokenizer = StandardTokenizer.new(text)
filter = StemFilter.new(tokenizer)
filter
end
end
end
I use it in my indexing like this:
acts_as_ferret({ :store_class_name => true,
:ferret => { :analyzer => Ferret::Analysis::StemmingAnalyzer.new },
:fields => {:property_names => { :boost => 3.0 },
....etc
}})
And in a search like this:
search_class.find_ids_with_ferret(search_term, {:limit => 10000, :analyzer => Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score|
r_id = r_id.to_i
ferret_ids << r_id
self.scores_hash[r_id] = score
end
I have a problem with case sensitivity - basically, searches only work when they are lowercase: even when it looks like the text stored in the index is uppercase. From the console -
>> resource.to_doc
=> {:resource_id=>"59", :property_names=>"Bb Clarinet Clarinet Family Woodwind Instrumental and Vocal Image Resources Types" }
>> TeachingObject.find_with_ferret("Vocal", :page => 1, :per_page => 1000).include?(resource)
=> false
>> TeachingObject.find_with_ferret("vocal", :page => 1, :per_page => 1000).include?(resource)
=> true
I think i have my stemming set up wrong, i'm not sure if it is even being used. I implemented it so that searches allowed pluralised and singular terms, and that seems to work, eg
>> TeachingObject.find_with_ferret("vocals", :page => 1, :per_page => 1000).include?(resource)
=> true
But the case sensitivity thing has me stumped. I thought that the downcase! call on the search term would make case irrelevant for searching but that seems not to be the case. Can anyone set me straight?
_______________________________________________ Ferret-talk mailing list Ferret-talk@... http://rubyforge.org/mailman/listinfo/ferret-talk
Invitation to connect on LinkedIn
|
I'd like to add you to my professional network on LinkedIn. Confirm that you know Santoshkumar Patil Every day, millions of professionals like Santoshkumar Patil use LinkedIn to connect with colleagues, find experts, and explore opportunities. © 2009, LinkedIn Corporation |
_______________________________________________ Ferret-talk mailing list Ferret-talk@... http://rubyforge.org/mailman/listinfo/ferret-talk
Can't remove duplicate
Hello all, I can't remove duplicate,I'm using ferret
to index log file in order to monitor application activity,
what I want to do is index data based on the uniqueness of
[filename,line](actullay should be [host,filename,line],
the code is following:
if !$indexer
field_infos = Ferret::Index::FieldInfos.new(:index =>
:untokenized_omit_norms,
:term_vector => :no)
field_infos.add_field(:content, :store => :yes, :index => :yes)
$indexer = Ferret::I.new(:path => index_dir,
:field_infos => field_infos,
:key => [:filename, :line],
:max_buffered_docs=>100)
#$indexer ||= Ferret::I.new(:path=>index_dir, :key => ['filename',
'line'], :max_buffered_docs=>100) #unique host,file_name,line
#$indexer.field_infos.add_field(:time,
# #:default_boost => 20,
# :store => :yes,
# :index => :untokenized,
# :term_vector => :no)
end
but the problem is, I will index a new datum even if the [filename,line]
is same, even I change :key => ["filename", "line"], it also doesn't
work,
what's the problem? Thanks.
Ferret Usability
Guys, I am new to Ferret- I have mixed feeling about this thing. On one side I really like the simplicity of the system- its easy to deploy and used, and I have a lot of choices on integration from aaf or doing my own isn't too hard too. On the other hand, I heard a lot of horrible stories from index corruption to segfaults. The most classical thread I can find is here: http://groups.google.com/group/rubyonrails-deployment/browse_thread/thread/980fe7cb20cb97dd Even Ezra <at> EY is basically saying Ferret is unusable. May I know how's the situation now? Anyone can nail down what actually had happened on their segfaults/index corruption?
RSS Feed