hudson | 1 Jan 2008 03:27
Picon

Build failed in Hudson: Lucene-Nightly #321

See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/321/changes

------------------------------------------
[...truncated 866 lines...]
A         contrib/db/bdb-je/src/java
A         contrib/db/bdb-je/src/java/org
A         contrib/db/bdb-je/src/java/org/apache
A         contrib/db/bdb-je/src/java/org/apache/lucene
A         contrib/db/bdb-je/src/java/org/apache/lucene/store
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je/File.java
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEDirectory.java
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEIndexInput.java
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEIndexOutput.java
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JELock.java
A         contrib/db/bdb-je/src/java/org/apache/lucene/store/je/Block.java
A         contrib/db/bdb-je/build.xml
A         contrib/db/bdb
A         contrib/db/bdb/pom.xml.template
A         contrib/db/bdb/src
A         contrib/db/bdb/src/test
A         contrib/db/bdb/src/test/org
A         contrib/db/bdb/src/test/org/apache
A         contrib/db/bdb/src/test/org/apache/lucene
A         contrib/db/bdb/src/test/org/apache/lucene/store
A         contrib/db/bdb/src/test/org/apache/lucene/store/db
A         contrib/db/bdb/src/test/org/apache/lucene/store/db/DbStoreTest.java
AU        contrib/db/bdb/src/test/org/apache/lucene/store/db/SanityLoadLibrary.java
A         contrib/db/bdb/src/java
A         contrib/db/bdb/src/java/org
(Continue reading)

Doron Cohen | 1 Jan 2008 07:55
Picon

Re: DocumentsWriter.checkMaxTermLength issues

On Dec 31, 2007 7:54 PM, Michael McCandless <lucene <at> mikemccandless.com>
wrote:

> I actually think indexing should try to be as robust as possible.  You
> could test like crazy and never hit a massive term, go into production
> (say, ship your app to lots of your customer's computers) only to
> suddenly see this exception.  In general it could be a long time before
> you "accidentally" our users see this.
>
> So I'm thinking we should have the default behavior, in IndexWriter,
> be to skip immense terms?
>
> Then people can use TokenFilter to change this behavior if they want.
>

+1

At first I saw this similar to IndexWriter.setMaxFieldLength(), but it was
a wrong comparison, because #terms is a "real" indexing/serarch
characteristic that many applications can benefit from being able
to modify, whereas a huge token is in most cases a bug.

Just to make sure on the scenario - the only change is to skip too long
tokens, while any other exception is thrown (not ignored.)

Also, for a skipped token I think the position increment of the
following token should be incremented.
Michael McCandless | 1 Jan 2008 11:50

Re: DocumentsWriter.checkMaxTermLength issues


Doron Cohen wrote:

> On Dec 31, 2007 7:54 PM, Michael McCandless  
> <lucene <at> mikemccandless.com>
> wrote:
>
>> I actually think indexing should try to be as robust as possible.   
>> You
>> could test like crazy and never hit a massive term, go into  
>> production
>> (say, ship your app to lots of your customer's computers) only to
>> suddenly see this exception.  In general it could be a long time  
>> before
>> you "accidentally" our users see this.
>>
>> So I'm thinking we should have the default behavior, in IndexWriter,
>> be to skip immense terms?
>>
>> Then people can use TokenFilter to change this behavior if they want.
>>
>
> +1

OK I will take this approach.

> At first I saw this similar to IndexWriter.setMaxFieldLength(), but  
> it was
> a wrong comparison, because #terms is a "real" indexing/serarch
> characteristic that many applications can benefit from being able
(Continue reading)

Michael Busch (JIRA | 1 Jan 2008 12:45
Picon
Favicon

[jira] Resolved: (LUCENE-571) StandardTokenizer parses decimal number as <HOST>


     [
https://issues.apache.org/jira/browse/LUCENE-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-571.
----------------------------------

    Resolution: Duplicate

See LUCENE-1100.

> StandardTokenizer parses decimal number as <HOST>
> -------------------------------------------------
>
>                 Key: LUCENE-571
>                 URL: https://issues.apache.org/jira/browse/LUCENE-571
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.9
>            Reporter: Tom Emerson
>            Priority: Minor
>
> The standard tokenizer in 1.9.1 returns a decimal number such as "3.14" as a <HOST>, though a number like
"3,141.59" is returned as a <NUM>. I believe, though I haven't tried it yet, that moving the rule for <HOST>
after <NUM>, instead of before it, will obviate this. Or updating <HOST> to require a TLD as the last
component, which would require you to split the interpretation of IP addresses from name-based addresses.

--

-- 
This message is automatically generated by JIRA.
(Continue reading)

Michael Busch (JIRA | 1 Jan 2008 12:51
Picon
Favicon

[jira] Resolved: (LUCENE-396) [PATCH] Add position increment back into StopFilter


     [
https://issues.apache.org/jira/browse/LUCENE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-396.
----------------------------------

    Resolution: Duplicate
      Assignee:     (was: Lucene Developers)

See LUCENE-1095.

> [PATCH] Add position increment back into StopFilter
> ---------------------------------------------------
>
>                 Key: LUCENE-396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-396
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.4
>         Environment: Operating System: other
> Platform: All
>            Reporter: Mike Barry
>            Priority: Minor
>         Attachments: StopFilter.patch, TestStopFilter.java, TestStopFilter.java
>
>
> Currently, if you index a document that contains "climate of change", then a
> phrase query of "climate change" will return that document because StopFilter
(Continue reading)

Michael Busch (JIRA | 1 Jan 2008 13:05
Picon
Favicon

[jira] Resolved: (LUCENE-233) [PATCH] analyzer refactoring based on CVS HEAD from 6/21/2004


     [
https://issues.apache.org/jira/browse/LUCENE-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-233.
----------------------------------

    Resolution: Duplicate
      Assignee:     (was: Lucene Developers)

See LUCENE-210.

> [PATCH] analyzer refactoring based on CVS HEAD from 6/21/2004
> -------------------------------------------------------------
>
>                 Key: LUCENE-233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-233
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: CVS Nightly - Specify date in submission
>         Environment: Operating System: All
> Platform: All
>            Reporter: Rasik Pandey
>            Priority: Minor
>         Attachments: analysis.zip
>
>
> Hello,
> As mentioned in previous exchanges, notably with Grant Ingersoll, I added some 
(Continue reading)

Timo Nentwig (JIRA | 1 Jan 2008 13:23
Picon
Favicon

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene


    [
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555190
] 

Timo Nentwig commented on LUCENE-997:
-------------------------------------

IMHO it definitely should be part of the core. Being able to control the runtime of queries/your ressources
is crucial for every live system and I really wonder this it has taken so long to address this.

Otherwise I totally agree with Navdav: that Hits thingie is nice and fine for simple full-text queries but
as soon as things become somewhat more complex you don't get around writing your own HitCollector (and do
stuff like Facets).

I also strongly agree that the timeout HC should be implemented as an decorator (what's been called
"front-end" here), I just quickly wrote an example and attached it (no, I'm not happy throwing an runtime
exception either):

MyHitCollector hc = new MyHitCollector();
s.search(q, null, HitCollectorTimeoutDecorator.decorate( hc, 10 ) );

And finally, why return partial results? I don't think that this is reasonable.

BTW I'm not sure whether volatile in the timer thread is really working reliably in 1.4...

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
(Continue reading)

Timo Nentwig (JIRA | 1 Jan 2008 13:23
Picon
Favicon

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene


     [
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Nentwig updated LUCENE-997:
--------------------------------

    Attachment: HitCollectorTimeoutDecorator.java

Example of the timeout HitCollector implemented as an decorator.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: HitCollectorTimeoutDecorator.java, LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308. 
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is
stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe
unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029
(Continue reading)

Timo Nentwig (JIRA | 1 Jan 2008 13:25
Picon
Favicon

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene


     [
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Nentwig updated LUCENE-997:
--------------------------------

    Attachment: MyHitCollector.java

Example HitCollector to be decorated by HitCollectorTimeoutDecorator.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: HitCollectorTimeoutDecorator.java, LuceneTimeoutTest.java, MyHitCollector.java,
timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308. 
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is
stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe
unsynchronized timer.
> This was also discussed in an e-mail thread.
(Continue reading)

Michael Busch (JIRA | 1 Jan 2008 13:51
Picon
Favicon

[jira] Resolved: (LUCENE-746) Incorrect error message in AnalyzingQueryParser.getPrefixQuery


     [
https://issues.apache.org/jira/browse/LUCENE-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-746.
----------------------------------

    Resolution: Fixed

Committed. Thanks, Ronnie!

> Incorrect error message in AnalyzingQueryParser.getPrefixQuery
> --------------------------------------------------------------
>
>                 Key: LUCENE-746
>                 URL: https://issues.apache.org/jira/browse/LUCENE-746
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Ronnie Kolehmainen
>            Priority: Minor
>         Attachments: AnalyzingQueryParser.getPrefixQuery.patch
>
>
> The error message of  getPrefixQuery is incorrect when tokens were added, for example by a stemmer. The
message is "token was consumed" even if tokens were added.
> Attached is a patch, which when applied gives a better description of what actually happened.

--

-- 
This message is automatically generated by JIRA.
(Continue reading)


Gmane