Toph | 1 Jul 2008 01:13
Picon

Re: Incorrect Token Offset when using multiple fieldable instance


Interesting discussion... glad I'm not the only one with this challenge.

Michael McCandless-2 wrote:
> 
> EG, if you use Highlighter on a  
> multi-valued field indexed with stored field & term vectors and say  
> the first field ended with a stop word that was filtered out, then  
> your offsets will be off and the wrong parts will be highlighted 
> 

I found this post by attempting just this exact thing, and I can confirm,
that yes, the offsets are incorrect for all but the first instance of the
field in the document, so they are useless for highlighting.  I tried
concatenating all instances of the fields, but of course if an instance of
the field ended with punctuation or a stop word, those characters were not
added to the offset.  I'll try the suggested workaround re adding a false
term at the end of each field, but a better API would be if "offset" became
a pair of ints, first being the index of the Field for getFields(name) and
the second being the offset in that instance of the field.

Christopher
--

-- 
View this message in context: http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468p18206216.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Paul J. Lucas | 1 Jul 2008 02:48
Gravatar

Re: FileNotFoundException in ConcurrentMergeScheduler

Sorry for the radio silence.  I changed my code around so that a  
single IndexReader and IndexSearcher are shared.  Since doing that,  
I've not seen the problem.  That being the case, I didn't pursue the  
issue.

I still think there's a bug because the code I had previously, IMHO,  
should have worked just fine.

- Paul

On May 30, 2008, at 2:59 AM, Michael McCandless wrote:

> Paul J. Lucas wrote:
>
>> On May 29, 2008, at 6:35 PM, Michael McCandless wrote:
>>
>>> Can you use lsof (or something similar) to see how many files you  
>>> have?
>>
>> FYI: I personally can't reproduce this; only a coworker can and  
>> even then it's sporadic, so it could take a little while.
>
> If possible, could you call IndexWriter.setInfoStream(...) and  
> capture the log produced leading up to the exception?  It could help  
> see if there is a bug here, by showing whether IndexWriter actually  
> deleted that file or not.
>
>>> Merging, especially several running at once, can greatly increase  
>>> open file count, and especially if mergeFactor is increased.
>>
(Continue reading)

Paul J. Lucas | 1 Jul 2008 02:55
Gravatar

Sorting case-insensitively

If I have a SortField with a type of STRING, is there any way to sort  
in a case-insensitive manner?

- Paul
Erik Hatcher | 1 Jul 2008 04:00
Favicon

Re: Sorting case-insensitively


On Jun 30, 2008, at 8:55 PM, Paul J. Lucas wrote:
> If I have a SortField with a type of STRING, is there any way to  
> sort in a case-insensitive manner?

Only if you unify the case (lower case everything) on the client side  
that you send to Solr, but in general no.

You can use a text field type that uses a KeywordTokenizer(Factory)  
and lowercase on the Solr-side though.  The Solr example schema has  
one such "alphaOnlySort" field type.

	Erik
lutan | 1 Jul 2008 04:51
Picon

RE: how to statistics categories amount


Thanks for reply,but I  am not know Solr well.
Does Solr base on lucene core or hack lucene core
to achieve the (categories amount)'s function ?
Can I achieve the similar function using 
lucene core ?
> From: erik <at> ehatchersolutions.com> To: java-user <at> lucene.apache.org> Subject: Re: how to statistics
categories amount> Date: Sat, 28 Jun 2008 05:36:12 -0400> > > On Jun 28, 2008, at 3:57 AM, lutan wrote:> > if I
search a keyword likes 'computer' in a shopping website.> > the result may contains.> > total:> > (1000)
products .> > categories:> > pc (500) products .> > notebook (300) products .> > server (200) products .> >>
> so how do get each categories 'amount.> > I try to search many times on onec user search,> > but it is so slow>
> This is a case where you are probably better off starting with Solr, > which supports faceting natively.>
> The main trick to making this fast is coming up with cached sets of > each of the categories and
intersecting each of those sets with the > main resul
 t set and using the cardinality of the intersected sets for > the counts. Again, Solr is what I'd recommend as
a starting point for > you.> > Erik> > > --------------------------------------
 -------------------------------> To unsubscribe, e-mail:
java-user-unsubscribe <at> lucene.apache.org> For additional commands, e-mail:
java-user-help <at> lucene.apache.org> 
_________________________________________________________________
多个邮箱同步管理,live mail客户端万人抢用中
http://get.live.cn/product/mail.html
Paul J. Lucas | 1 Jul 2008 05:08
Gravatar

Re: Sorting case-insensitively

On Jun 30, 2008, at 7:00 PM, Erik Hatcher wrote:

> On Jun 30, 2008, at 8:55 PM, Paul J. Lucas wrote:
>> If I have a SortField with a type of STRING, is there any way to  
>> sort in a case-insensitive manner?
>
> Only if you unify the case (lower case everything) on the client  
> side that you send to Solr, but in general no.
>
> You can use a text field type that uses a KeywordTokenizer(Factory)  
> and lowercase on the Solr-side though.  The Solr example schema has  
> one such "alphaOnlySort" field type.

Couldn't I also use a custom SortComparator?

- Paul
lutan | 1 Jul 2008 08:33
Picon

RE: How Lucene Search


I have same questions puzzled me. 
Could anyone explain which class been called in  the searching  steps ?
Thanks!
> Date: Thu, 26 Jun 2008 00:24:08 -0700> From: blazingwolf7 <at> gmail.com> To:
java-user <at> lucene.apache.org> Subject: How Lucene Search> > > hi, > > I am fairly new to Lucene and is
currently going over its source code. I had> read through the code for a few times, mapping it and all but I
seems to be> facing a problem. I could go all the way to the calculation of score for> each result obtain, but
strangely I did not managed to locate the part where> Lucene open the index and check for the matching
term.> > What I mean is that, I want to check on how Lucene actually open the index> and perform the search. I
went through all the methods in IndexReader,> IndexSearcher and some other related class but still fail
to locate the> method responsible.> > Could anyone help me with this? Thanks > -- > V
 iew this message in context: http://www.nabble.com/How-Lucene-Search-tp18127970p18127970.html>
Sent from the Lucene - Java Users mailing list archive at Nabble.com.> > > --------------------
 -------------------------------------------------> To unsubscribe, e-mail:
java-user-unsubscribe <at> lucene.apache.org> For additional commands, e-mail:
java-user-help <at> lucene.apache.org> 
_________________________________________________________________
多个邮箱同步管理,live mail客户端万人抢用中
http://get.live.cn/product/mail.html
Michael McCandless | 1 Jul 2008 10:37

Re: FileNotFoundException in ConcurrentMergeScheduler


Hmmm then it sounds possible you were in fact running out of file  
descriptors.

What was your mergeFactor set to?

Mike

Paul J. Lucas wrote:

> Sorry for the radio silence.  I changed my code around so that a  
> single IndexReader and IndexSearcher are shared.  Since doing that,  
> I've not seen the problem.  That being the case, I didn't pursue the  
> issue.
>
> I still think there's a bug because the code I had previously, IMHO,  
> should have worked just fine.
>
> - Paul
>
>
> On May 30, 2008, at 2:59 AM, Michael McCandless wrote:
>
>> Paul J. Lucas wrote:
>>
>>> On May 29, 2008, at 6:35 PM, Michael McCandless wrote:
>>>
>>>> Can you use lsof (or something similar) to see how many files you  
>>>> have?
>>>
(Continue reading)

Michael McCandless | 1 Jul 2008 10:48

Re: Problems with reopening IndexReader while pushing documents to the index


OK thanks for the answers below.

One thing to realize is, with this specific corruption, you will only  
hit the exception if the one term that has the corruption is queried  
on.  Ie, only a certain term in a query will hit the corruption.

That's great news that it's easily reproduced -- can you post the code  
you're using that hits it?  It's easily reproduced when starting from  
a newly created index, right?

Mike

Sascha Fahl wrote:

> It is easyily reproduced. The strange thing is that when I check the  
> IndexReader for currentness some IndexReader seem to get the  
> corrupted version of the index and some not (the IndexReader gets  
> reopened around 10 times while adding the documents to the index and  
> sending 10.000 requests to the index). So maybe something goes wrong  
> when the IndexReader fetches the index while IndexWriter flushes  
> data to the index ( I did not change the default MergePolicy)?
> I will do the CheckIndex thing asap.
> I do not change any of the indexwriter settings. That is how I  
> initialize a new IndexWriter: this.indexWriter = new  
> IndexWriter(index_dir, new LiveAnalyzer(), false);
> I am working with a singleton (so only one thread adds documents to  
> the index).
> This is what java -version says: java version "1.5.0_13"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13- 
(Continue reading)

Michael McCandless | 1 Jul 2008 10:52

Re: Problems with reopening IndexReader while pushing documents to the index


By "does not help" do you mean CheckIndex never detects this  
corruption, yet you then hit that exception when searching?

By "reopening fails" what do you mean?  I thought reopen works fine,  
but then it's only the search that fails?

Mike

Sascha Fahl wrote:

> Checking the index after adding documents and befor reopening the  
> IndexReader does not help. After adding documents nothing bad  
> happens and CheckIndex says the index is all right. But when I check  
> the index before reopen it
> CheckIndex does not detect any corruption and says the index is ok  
> and reopening fails.
>
> Sascha
>
> Am 30.06.2008 um 18:34 schrieb Michael McCandless:
>
>>
>> This is spooky: that exception means you have some sort of index  
>> corruption.  The TermScorer thinks it found a doc ID 37389, which  
>> is out of bounds.
>>
>> Reopening IndexReader while IndexWriter is writing should be  
>> completely fine.
>>
(Continue reading)


Gmane