Konstantyn Smirnov | 1 Aug 13:03 2012
Picon

Re: is there a way to control when merges happen?

Hi Mike.

I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
I tried adding new segments with 800-5000 documents in each of them in a
row, but the scheduler seemed to ignore them at first... only after some
time it managed to merge some of them.

I have an option to use a quartz-scheduler to trigger my mergers, but I
would like to keep that logic where it really belongs: in Lucene's
mergeScheduler.

Is there a way to control merge scheduling now (with 3.6.0)?
When exactly the scheduler is triggered: upon adding a new segment, or is it
running every n hours? Can I configure the scheduler to do both?

TIA

--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-control-when-merges-happen-tp560736p3998571.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Konstantyn Smirnov | 1 Aug 13:13 2012
Picon

Re: Lucene vs SQL.

If you tokenize AND store fields in your document, you can always pull them
and re-invert using another analyzer, so you don't need to store the
"original data" somewhere else.

The point is rather the performance. I started a discussion on that topic 
http://lucene.472066.n3.nabble.com/Performance-of-storing-data-in-Lucene-vs-other-No-SQL-Databases-tt3984704.html
here 

--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-SQL-tp3997494p3998573.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Robert Muir | 1 Aug 17:37 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

On Tue, Jul 31, 2012 at 2:34 PM, roz dev <rozdev29 <at> gmail.com> wrote:
> Hi All
>
> I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
> when we are indexing lots of data with 16 concurrent threads, Heap grows
> continuously. It remains high and ultimately most of the stuff ends up
> being moved to Old Gen. Eventually, Old Gen also fills up and we start
> getting into excessive GC problem.

Hi: I don't claim to know anything about how tomcat manages threads,
but really you shouldnt have all these objects.

In general snowball stemmers should be reused per-thread-per-field.
But if you have a lot of fields*threads, especially if there really is
high thread churn on tomcat, then this could be bad with snowball:
see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

I think it would be useful to see if you can tune tomcat's threadpool
as he describes.

separately: Snowball stemmers are currently really ram-expensive for
stupid reasons.
each one creates a ton of Among objects, e.g. an EnglishStemmer today
is about 8KB.

I'll regenerate these and open a JIRA issue: as the snowball code
generator in their svn was improved
recently and each one now takes about 64 bytes instead (the Among's
are static and reused).

(Continue reading)

roz dev | 2 Aug 07:53 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj

On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rcmuir <at> gmail.com> wrote:

> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <rozdev29 <at> gmail.com> wrote:
> > Hi All
> >
> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> that
> > when we are indexing lots of data with 16 concurrent threads, Heap grows
> > continuously. It remains high and ultimately most of the stuff ends up
> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> > getting into excessive GC problem.
>
> Hi: I don't claim to know anything about how tomcat manages threads,
> but really you shouldnt have all these objects.
>
> In general snowball stemmers should be reused per-thread-per-field.
(Continue reading)

Simon Willnauer | 2 Aug 08:07 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozdev29 <at> gmail.com> wrote:
> Thanks Robert for these inputs.
>
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do this change yet as
> we would be reducing thread pool which can adversely impact our throughput
>
> If Snowball Filter is being optimized for Solr 4 beta then it would be
> great for us. If you have already filed a JIRA for this then please let me
> know and I would like to follow it

AFAIK Robert already created and issue here:
https://issues.apache.org/jira/browse/LUCENE-4279
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

simon
>
> Thanks again
> Saroj
>
>
>
>
>
> On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rcmuir <at> gmail.com> wrote:
>
>> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <rozdev29 <at> gmail.com> wrote:
>> > Hi All
(Continue reading)

roz dev | 2 Aug 08:32 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

wow!! That was quick.

Thanks a ton.

On Wed, Aug 1, 2012 at 11:07 PM, Simon Willnauer
<simon.willnauer <at> gmail.com>wrote:

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozdev29 <at> gmail.com> wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
> simon
> >
(Continue reading)

Dawid Weiss | 2 Aug 08:34 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

http://static1.blip.pl/user_generated/update_pictures/1758685.jpg

On Thu, Aug 2, 2012 at 8:32 AM, roz dev <rozdev29 <at> gmail.com> wrote:
> wow!! That was quick.
>
> Thanks a ton.
>
>
> On Wed, Aug 1, 2012 at 11:07 PM, Simon Willnauer
> <simon.willnauer <at> gmail.com>wrote:
>
>> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozdev29 <at> gmail.com> wrote:
>> > Thanks Robert for these inputs.
>> >
>> > Since we do not really Snowball analyzer for this field, we would not use
>> > it for now. If this still does not address our issue, we would tweak
>> thread
>> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
>> as
>> > we would be reducing thread pool which can adversely impact our
>> throughput
>> >
>> > If Snowball Filter is being optimized for Solr 4 beta then it would be
>> > great for us. If you have already filed a JIRA for this then please let
>> me
>> > know and I would like to follow it
>>
>> AFAIK Robert already created and issue here:
>> https://issues.apache.org/jira/browse/LUCENE-4279
>> and it seems fixed. Given the massive commit last night its already
(Continue reading)

Laurent Vaills | 2 Aug 09:13 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Hi everyone,

Is there any chance to get his backported for a 3.6.2 ?

Regards,
Laurent

2012/8/2 Simon Willnauer <simon.willnauer <at> gmail.com>

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozdev29 <at> gmail.com> wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
(Continue reading)

Carsten Schnober | 2 Aug 10:19 2012
Picon

Re: Small Vocabulary

Am 31.07.2012 12:10, schrieb Ian Lea:

Hi Ian,

> Lucene 4.0 allows you to use custom codecs and there may be one that
> would be better for this sort of data, or you could write one.
> 
> In your tests is it the searching that is slow or are you reading lots
> of data for lots of docs?  The latter is always likely to be slow.
> General performance advice as in
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed may be
> relevant.  SSDs and loads of RAM never hurt.

You are very right, therer are many results from many docs for the
slower searches performed on that index. However, I am still wondering
about the theoretical implications: having a small vocabulary with many
tokens in an inverted index would yield a rather long list of
occurrences for some/many/all (depending on the actual distribution) of
the search terms.
Thanks for your pointer to the codecs in Lucene 4, I suppose that this
will be the actual point to attack for that scenario. It may be a silly
question, but one that might be of interest for the whole community ;-)
: can someone point me to an in-depth documentation of Lucene 4 codecs,
ideally covering both theoretical backgrounds and implementation? There
are numerous helpful blog entries, presentations, etc. available on the
net, but in case there is some central instance, I have not been able to
find it anyway.
Thanks!
Best regards,
Carsten
(Continue reading)

Robert Muir | 2 Aug 15:00 2012
Picon

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills <laurent.vaills <at> gmail.com> wrote:
> Hi everyone,
>
> Is there any chance to get his backported for a 3.6.2 ?
>

Hello, I personally have no problem with it: but its really
technically not a bugfix, just an optimization.

It also doesnt solve the actual problem if you have a tomcat
threadpool configuration recycling threads too fast. There will be
other performance problems.

--

-- 
lucidimagination.com


Gmane