Doug Cutting | 1 Mar 2006 01:07
Picon
Favicon
Gravatar

Re: version issue ?

Andi Vajda wrote:
> The announcement about the Java Lucene 1.9 final release was made today 
> but common-build.xml stills lists "1.9-rc1-dev" for the "version" 
> property both on the 1.9 branch and on the trunk. Is this an oversight ?

I wouldn't read too much into it.  It's convenient to have trunk and the 
1.9 branch synchronized as long as possible.  Said another way, it's 
inconvenient when they diverge, since merging becomes harder.

I made the 1.9 release yesterday, then waited 24 hours for it to get 
mirrorred.  Today I announced it to the developer list.  If no problems 
are reported, then tomorrow I'll announce it to the user list.  Once 
it's really out the door and we should start moving trunk to 2.0.  We 
could start doing that now, since we have a branch, and it probably 
wouldn't cause any inconvenience.  This is a long way of saying that the 
version number in trunk will probably change to 2.0-dev tomorrow.

If we end up making changes in the 1.9 branch, with the intent of making 
a 1.9.1 release, then we should update the version in that branch to be 
1.9.1-dev.  We could do that now if we like, but I probably won't bother 
to do it until I apply a patch to that branch.

Doug
Andi Vajda | 1 Mar 2006 01:16
Favicon

Re: version issue ?


On Tue, 28 Feb 2006, Doug Cutting wrote:

> Andi Vajda wrote:
>> The announcement about the Java Lucene 1.9 final release was made today but 
>> common-build.xml stills lists "1.9-rc1-dev" for the "version" property both 
>> on the 1.9 branch and on the trunk. Is this an oversight ?
>
> I wouldn't read too much into it.  It's convenient to have trunk and the 1.9 
> branch synchronized as long as possible.  Said another way, it's inconvenient 
> when they diverge, since merging becomes harder.
>
> I made the 1.9 release yesterday, then waited 24 hours for it to get 
> mirrorred.  Today I announced it to the developer list.  If no problems are 
> reported, then tomorrow I'll announce it to the user list.  Once it's really 
> out the door and we should start moving trunk to 2.0.  We could start doing 
> that now, since we have a branch, and it probably wouldn't cause any 
> inconvenience.  This is a long way of saying that the version number in trunk 
> will probably change to 2.0-dev tomorrow.
>
> If we end up making changes in the 1.9 branch, with the intent of making a 
> 1.9.1 release, then we should update the version in that branch to be 
> 1.9.1-dev.  We could do that now if we like, but I probably won't bother to 
> do it until I apply a patch to that branch.

Understood. What threw me off was the 'rc1-dev' part of the version on the 
branch. I'd expect it to say 1.9 or 1.9-final since the .jar files produced 
by a 1.9 branch checkout all say 1.9-rc1-dev even though the branch is past 
that now.

(Continue reading)

Doug Cutting | 1 Mar 2006 01:23
Picon
Favicon
Gravatar

Re: version issue ?

Andi Vajda wrote:
> Understood. What threw me off was the 'rc1-dev' part of the version on 
> the branch. I'd expect it to say 1.9 or 1.9-final since the .jar files 
> produced by a 1.9 branch checkout all say 1.9-rc1-dev even though the 
> branch is past that now.

Previously I've always incremented the version before making the 
release, and this always seemed to confuse folks.  This time I tried not 
incrementing it, and that seems to confuse folks too!

My one concern is that I don't think code that folks download should 
build something called 1.9-final, as that could be confusing: we should 
reserve names that sound like real releases for real releases, compiled 
with the correct JVM, etc.

If you have strong feelings about what should be in these I'd love to 
hear them!

Doug
Andi Vajda | 1 Mar 2006 01:42
Favicon

Re: version issue ?


On Tue, 28 Feb 2006, Doug Cutting wrote:

> Andi Vajda wrote:
>> Understood. What threw me off was the 'rc1-dev' part of the version on the 
>> branch. I'd expect it to say 1.9 or 1.9-final since the .jar files produced 
>> by a 1.9 branch checkout all say 1.9-rc1-dev even though the branch is past 
>> that now.
>
> Previously I've always incremented the version before making the release, and 
> this always seemed to confuse folks.  This time I tried not incrementing it, 
> and that seems to confuse folks too!
>
> My one concern is that I don't think code that folks download should build 
> something called 1.9-final, as that could be confusing: we should reserve 
> names that sound like real releases for real releases, compiled with the 
> correct JVM, etc.
>
> If you have strong feelings about what should be in these I'd love to hear 
> them!

No particularly strong feelings here, I understand the trade-offs now.

It would seem to me that the source code snapshot that is made to release 
'official' source and binary tarballs on the Lucene website should correspond 
to a precise svn version and that that version's common-build.xml should 
reflect the release number, ie 1.9 or 1.9-final in its "version" property.

If the 1.9 branch were to be modified for making, say, a 1.9.1 release, the 
first change should be in common-build.xml for the version to say something 
(Continue reading)

Steven Tamm (JIRA | 1 Mar 2006 03:34
Picon
Favicon

[jira] Created: (LUCENE-502) TermScorer caches values unnecessarily

TermScorer caches values unnecessarily
--------------------------------------

         Key: LUCENE-502
         URL: http://issues.apache.org/jira/browse/LUCENE-502
     Project: Lucene - Java
        Type: Improvement
  Components: Search  
    Versions: 1.9    
    Reporter: Steven Tamm

TermScorer aggressively caches the doc and freq of 32 documents at a time for each term scored.  When
querying for a lot of terms, this causes a lot of garbage to be created that's unnecessary.  The
SegmentTermDocs from which it retrieves its information doesn't have any optimizations for bulk
loading, and it's unnecessary.

In addition, it has a SCORE_CACHE, that's of limited benefit.  It's caching the result of a sqrt that should
be placed in DefaultSimilarity, and if you're only scoring a few documents that contain those terms,
there's no need to precalculate the SQRT, especially on modern VMs.

Enclosed is a patch that replaces TermScorer with a version that does not cache the docs or feqs.  In the case
of a lot of queries, that saves 196 bytes/term, the unnecessary disk IO, and extra SQRTs which adds up.

--

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
(Continue reading)

Steven Tamm (JIRA | 1 Mar 2006 06:34
Picon
Favicon

[jira] Updated: (LUCENE-502) TermScorer caches values unnecessarily

     [ http://issues.apache.org/jira/browse/LUCENE-502?page=all ]

Steven Tamm updated LUCENE-502:
-------------------------------

    Attachment: TermScorer.patch

Here's the patch

Sorry about my lack of proofreading, I saved right as I was leaving work.  

The main point is that the look ahead caching done by TermScorer is unnecessary.  It is only of benefit if you
are scoring in a given locality (i.e. query doc 0, then 30, then 10, then 3, etc).  Nearly all use cases are
sequential: the use of seek vs. next() is fine because the underlying BufferedIndexInput has an
efficient seek function for sequential access.  

Here's an HPROF run from a set of sequential wildcard searches (with many terms per search).  Since this
never performs sequential access on documents, the "cache" is completely unnecessary. 

          percent          live          alloc'ed  stack class
 rank   self  accum     bytes objs     bytes  objs trace name
   29  0.79% 58.64%   1029312 7148   1801296 12509 387945 float[]
   30  0.79% 59.43%   1029312 7148   1801296 12509 387944 int[]
   31  0.79% 60.23%   1029312 7148   1801296 12509 387943 int[]

TRACE 387943:
	org.apache.lucene.search.TermScorer.<init>(TermScorer.java:30)
	org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64)
	org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:165)
	org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:158)
(Continue reading)

Samphan Raruenrom (JIRA | 1 Mar 2006 11:45
Picon
Favicon

[jira] Created: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
---------------------------------------------------------------

         Key: LUCENE-503
         URL: http://issues.apache.org/jira/browse/LUCENE-503
     Project: Lucene - Java
        Type: New Feature
  Components: Analysis  
    Versions: 1.4    
    Reporter: Samphan Raruenrom

Thai text don't have space between words. Usually, a dictionary-based algorithm is used to break string
into words. For Lucene to be usable for Thai, an Analyzer that know how to break Thai words is needed.

I've implemented such Analyzer, ThaiAnalyzer, using ICU4j DictionaryBasedBreakIterator for word
breaking. I'll upload the code later.

I'm normally a C++ programmer and very new to Java. Please review the code for any problem. One possible
problem is that it requires ICU4j. I don't know whether this is OK.

--

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira
Jörg Henß | 1 Mar 2006 13:38
Picon

FuzzyQuery / PriorityQueue BUG

Hi,
FuzzyQuery produces a "java.lang.NegativeArraySizeException" in
PriorityQueue.initialize if I use Integer.MAX_VALUE as
BooleanQuery.MaxClauseCount. This is because it adds 1 to MaxClauseCount and
tries to allocate an Array of this Size (I think it overflows to MIN_VALUE).
Usually nobody needs so much clauses, but I think this should be catched
somehow. Perhaps an Error "your MaxClauseCount is too large" could do it, so
the user knows where to find the problem.
Greets
Joerg
Bernhard Messer | 1 Mar 2006 14:19
Picon
Favicon

Re: FuzzyQuery / PriorityQueue BUG

Jörg,

could you please add this to JIRA so that things don't get lost. If you 
have a patch and/or a testcase showing the problem, it would be great if 
you append it to JIRA also.

thanks,
 Bernhard

Jörg Henß wrote:

>Hi,
>FuzzyQuery produces a "java.lang.NegativeArraySizeException" in
>PriorityQueue.initialize if I use Integer.MAX_VALUE as
>BooleanQuery.MaxClauseCount. This is because it adds 1 to MaxClauseCount and
>tries to allocate an Array of this Size (I think it overflows to MIN_VALUE).
>Usually nobody needs so much clauses, but I think this should be catched
>somehow. Perhaps an Error "your MaxClauseCount is too large" could do it, so
>the user knows where to find the problem.
>Greets
>Joerg
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
>For additional commands, e-mail: java-dev-help <at> lucene.apache.org
>
>
>  
>
(Continue reading)

Joerg Henss (JIRA | 1 Mar 2006 15:54
Picon
Favicon

[jira] Created: (LUCENE-504) FuzzyQuery produces a "java.lang.NegativeArraySizeException" in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

FuzzyQuery produces a "java.lang.NegativeArraySizeException" in PriorityQueue.initialize if I use
Integer.MAX_VALUE as BooleanQuery.MaxClauseCount
--------------------------------------------------------------------------------------------------------------------------------------------------

         Key: LUCENE-504
         URL: http://issues.apache.org/jira/browse/LUCENE-504
     Project: Lucene - Java
        Type: Bug
  Components: Search  
    Versions: 1.9    
    Reporter: Joerg Henss
    Priority: Minor

PriorityQueue creates an "java.lang.NegativeArraySizeException" when initialized with
Integer.MAX_VALUE, because Integer overflows. I think this could be a general problem with
PriorityQueue. The Error occured when I set BooleanQuery.MaxClauseCount to Integer.MAX_VALUE and
user a FuzzyQuery for searching.

--

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Gmane