Konstantyn Smirnov | 1 Aug 2012 13:03
Picon
Favicon

Re: is there a way to control when merges happen?

Hi Mike.

I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
I tried adding new segments with 800-5000 documents in each of them in a
row, but the scheduler seemed to ignore them at first... only after some
time it managed to merge some of them.

I have an option to use a quartz-scheduler to trigger my mergers, but I
would like to keep that logic where it really belongs: in Lucene's
mergeScheduler.

Is there a way to control merge scheduling now (with 3.6.0)?
When exactly the scheduler is triggered: upon adding a new segment, or is it
running every n hours? Can I configure the scheduler to do both?

TIA

--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-control-when-merges-happen-tp560736p3998571.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Konstantyn Smirnov | 1 Aug 2012 13:13
Picon
Favicon

Re: Lucene vs SQL.

If you tokenize AND store fields in your document, you can always pull them
and re-invert using another analyzer, so you don't need to store the
"original data" somewhere else.

The point is rather the performance. I started a discussion on that topic 
http://lucene.472066.n3.nabble.com/Performance-of-storing-data-in-Lucene-vs-other-No-SQL-Databases-tt3984704.html
here 

--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-SQL-tp3997494p3998573.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
roz dev | 2 Aug 2012 08:32
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

wow!! That was quick.

Thanks a ton.

On Wed, Aug 1, 2012 at 11:07 PM, Simon Willnauer
<simon.willnauer <at> gmail.com>wrote:

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozdev29 <at> gmail.com> wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
> simon
> >
(Continue reading)

Dawid Weiss | 2 Aug 2012 08:34
Picon
Gravatar

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

http://static1.blip.pl/user_generated/update_pictures/1758685.jpg

On Thu, Aug 2, 2012 at 8:32 AM, roz dev <rozdev29 <at> gmail.com> wrote:
> wow!! That was quick.
>
> Thanks a ton.
>
>
> On Wed, Aug 1, 2012 at 11:07 PM, Simon Willnauer
> <simon.willnauer <at> gmail.com>wrote:
>
>> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozdev29 <at> gmail.com> wrote:
>> > Thanks Robert for these inputs.
>> >
>> > Since we do not really Snowball analyzer for this field, we would not use
>> > it for now. If this still does not address our issue, we would tweak
>> thread
>> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
>> as
>> > we would be reducing thread pool which can adversely impact our
>> throughput
>> >
>> > If Snowball Filter is being optimized for Solr 4 beta then it would be
>> > great for us. If you have already filed a JIRA for this then please let
>> me
>> > know and I would like to follow it
>>
>> AFAIK Robert already created and issue here:
>> https://issues.apache.org/jira/browse/LUCENE-4279
>> and it seems fixed. Given the massive commit last night its already
(Continue reading)

Carsten Schnober | 2 Aug 2012 10:19
Picon

Re: Small Vocabulary

Am 31.07.2012 12:10, schrieb Ian Lea:

Hi Ian,

> Lucene 4.0 allows you to use custom codecs and there may be one that
> would be better for this sort of data, or you could write one.
> 
> In your tests is it the searching that is slow or are you reading lots
> of data for lots of docs?  The latter is always likely to be slow.
> General performance advice as in
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed may be
> relevant.  SSDs and loads of RAM never hurt.

You are very right, therer are many results from many docs for the
slower searches performed on that index. However, I am still wondering
about the theoretical implications: having a small vocabulary with many
tokens in an inverted index would yield a rather long list of
occurrences for some/many/all (depending on the actual distribution) of
the search terms.
Thanks for your pointer to the codecs in Lucene 4, I suppose that this
will be the actual point to attack for that scenario. It may be a silly
question, but one that might be of interest for the whole community ;-)
: can someone point me to an in-depth documentation of Lucene 4 codecs,
ideally covering both theoretical backgrounds and implementation? There
are numerous helpful blog entries, presentations, etc. available on the
net, but in case there is some central instance, I have not been able to
find it anyway.
Thanks!
Best regards,
Carsten
(Continue reading)

Zhang, Lisheng | 2 Aug 2012 19:55
Favicon

lucene Indexer failed to close, but later indexing still OK?

Hi,
 
We are using lucene 2.3.2 on linux/ubuntu (we will upgrade lucene soon), recently we got exception:
 
read past EOF #012java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:130)
at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:240)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java: 63)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:154)
at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:668)
at org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:756)
at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:3512)
...
 
This happened when we tried to commit the change. But later we ran the CheckIndex, there is no error
at all and we can still index without problem. 
 
Is this a know issue (I searched Google, it is said if applying 4.0 to old index data we may encounter this
error, but we all use lucene 2.3.2)?
 
Thanks very much for helps!
 
Lisheng
(Continue reading)

developer3459 | 2 Aug 2012 22:03
Picon

Re: BlockJoinQuery Clarification


Is there a way to delete a parent doc from the collection, or delete a child
doc from the collection? If so, will deleting the parent doc of a collection
orphan the associated child docs or will they automatically be deleted as
well? 

thanks
-D

--
View this message in context: http://lucene.472066.n3.nabble.com/BlockJoinQuery-Clarification-tp3848728p3998891.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Michael McCandless | 2 Aug 2012 22:06

Re: BlockJoinQuery Clarification

You must delete all children when you delete the parent.

I believe you can delete individual children and leave the parent undeleted.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Aug 2, 2012 at 4:03 PM, developer3459 <developer3459 <at> gmail.com> wrote:
>
> Is there a way to delete a parent doc from the collection, or delete a child
> doc from the collection? If so, will deleting the parent doc of a collection
> orphan the associated child docs or will they automatically be deleted as
> well?
>
> thanks
> -D
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/BlockJoinQuery-Clarification-tp3848728p3998891.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
developer3459 | 2 Aug 2012 22:28
Picon

Re: BlockJoinQuery Clarification

Is there an quick/easy way to delete the entire collection at once? Im
looking to delete and replace the entire collection in one fell swoop. 

--
View this message in context: http://lucene.472066.n3.nabble.com/BlockJoinQuery-Clarification-tp3848728p3998902.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Bill Chesky | 2 Aug 2012 23:09
Favicon

Analyzer on query question

Hi,

I understand that generally speaking you should use the same analyzer on querying as was used on indexing. 
In my code I am using the SnowballAnalyzer on index creation.  However, on the query side I am building up a
complex BooleanQuery from other BooleanQuerys and/or PhraseQuerys on several fields.  None of these
require specifying an analyzer anywhere.  This is causing some odd results, I think, because a different
analyzer (or no analyzer?) is being used for the query.

Question: how do I build my boolean and phrase queries using the SnowballAnalyzer?

One thing I did that seemed to kind of work was to build my complex query normally then build a
snowball-analyzed query using a QueryParser instantiated with a SnowballAnalyzer.  To do this, I simply
pass the string value of the complex query to the QueryParser.parse() method to get the new query. 
Something like this:

    // build a complex query from other BooleanQuerys and PhraseQuerys
    BooleanQuery fullQuery = buildComplexQuery();
    QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new
SnowballAnalyzer(Version.LUCENE_30, "English"));
    Query snowballAnalyzedQuery = parser.parse(fullQuery.toString());

    TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true);
    indexSearcher.search(snowballAnalyzedQuery, collector);

Like I said, this seems to kind of work but it doesn't feel right.  Does this make sense?  Is there a better way?

thanks in advance,

Bill

Gmane