Hoss Man (JIRA | 1 Jul 02:40 2009
Picon

[jira] Created: (LUCENE-1727) Order of stored Fields not maintained

Order of stored Fields not maintained
-------------------------------------

                 Key: LUCENE-1727
                 URL: https://issues.apache.org/jira/browse/LUCENE-1727
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.4.1, 2.4
            Reporter: Hoss Man

As noted in these threads...

http://www.nabble.com/Order-of-fields-returned-by-Document.getFields%28%29-to21034652.html
http://www.nabble.com/Order-of-fields-within-a-Document-in-Lucene-2.4%2B-to24210597.html

somewhere prior to Lucene 2.4.1 a change was introduced that prevents the Stored fields of a Document from
being returned in same order that they were originally added in.  This can cause serious performance
problems for people attempting to use LoadFirstFieldSelector or a custom FieldSelector with the
LOAD_AND_BREAK, or the SIZE_AND_BREAK options (since the fields don't come back in the order they expect)

Speculation in the email threads is that the origin of this bug is code introduced by LUCENE-1301 -- but the
purpose of that issue was refactoring, so if it really is the cause of the change this would seem to be a bug,
and not a side affect of a conscious implementation change.

Someone who understands indexing internals should investigate this.  At a minimum, if it's decided that
this is not actual a bug, then prior to resolving this bug the wiki docs and some of the FIeldSelector
javadocs should be updated to make it clear what order Fields will be returned in.

--

-- 
(Continue reading)

Rob ten Hove (JIRA | 1 Jul 10:50 2009
Picon

[jira] Commented: (LUCENE-1373) Most of the contributed Analyzers suffer from invalid recognition of acronyms.


    [
https://issues.apache.org/jira/browse/LUCENE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725972#action_12725972
] 

Rob ten Hove commented on LUCENE-1373:
--------------------------------------

Is it possible that when a property has a value that ends on "Type" like "InputFileType" is not indexed when
the OS language is Dutch due to the same bug? I have two installations of Alfresco 3 Labs with Lucene 2.1.0
autoinstalled and with exactly the same installation options (English as language for Alfresco) the
main difference next to the Hardware is the OS language. In both cases XP with SP2 but one English and the
other Dutch. In the installation on the Dutch OS three properties with values ending on Type could not be
found whereas they are present in the English version.

> Most of the contributed Analyzers suffer from invalid recognition of acronyms.
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-1373
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1373
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/analyzers
>    Affects Versions: 2.3.2
>            Reporter: Mark Lassau
>            Priority: Minor
>         Attachments: LUCENE-1373.patch
>
>
> LUCENE-1068 describes a bug in StandardTokenizer whereby a string like "www.apache.org." would be
(Continue reading)

Michael McCandless (JIRA | 1 Jul 11:24 2009
Picon

[jira] Assigned: (LUCENE-1727) Order of stored Fields not maintained


     [
https://issues.apache.org/jira/browse/LUCENE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1727:
------------------------------------------

    Assignee: Michael McCandless

> Order of stored Fields not maintained
> -------------------------------------
>
>                 Key: LUCENE-1727
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1727
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Hoss Man
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>
> As noted in these threads...
> http://www.nabble.com/Order-of-fields-returned-by-Document.getFields%28%29-to21034652.html
> http://www.nabble.com/Order-of-fields-within-a-Document-in-Lucene-2.4%2B-to24210597.html
> somewhere prior to Lucene 2.4.1 a change was introduced that prevents the Stored fields of a Document from
being returned in same order that they were originally added in.  This can cause serious performance
problems for people attempting to use LoadFirstFieldSelector or a custom FieldSelector with the
LOAD_AND_BREAK, or the SIZE_AND_BREAK options (since the fields don't come back in the order they expect)
(Continue reading)

Michael McCandless (JIRA | 1 Jul 11:24 2009
Picon

[jira] Updated: (LUCENE-1727) Order of stored Fields not maintained


     [
https://issues.apache.org/jira/browse/LUCENE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1727:
---------------------------------------

    Fix Version/s: 2.9

> Order of stored Fields not maintained
> -------------------------------------
>
>                 Key: LUCENE-1727
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1727
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Hoss Man
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>
> As noted in these threads...
> http://www.nabble.com/Order-of-fields-returned-by-Document.getFields%28%29-to21034652.html
> http://www.nabble.com/Order-of-fields-within-a-Document-in-Lucene-2.4%2B-to24210597.html
> somewhere prior to Lucene 2.4.1 a change was introduced that prevents the Stored fields of a Document from
being returned in same order that they were originally added in.  This can cause serious performance
problems for people attempting to use LoadFirstFieldSelector or a custom FieldSelector with the
LOAD_AND_BREAK, or the SIZE_AND_BREAK options (since the fields don't come back in the order they expect)
(Continue reading)

Simon Willnauer (JIRA | 1 Jul 12:02 2009
Picon

[jira] Commented: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement


    [
https://issues.apache.org/jira/browse/LUCENE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725985#action_12725985
] 

Simon Willnauer commented on LUCENE-1722:
-----------------------------------------

bq. In my opinion, things such as Utility class and everything in .hhmm package should be
package-protected. I already wasted time partially javadocing these things, which probably wasn't a
complete waste, but you get the idea.
Yeah those could be cleaned up quite a bit. Lets do this in a different patch / issue after this one is commited.

bq. I think in the short term, I like this patch as is because I think developers will be able to port it to the
new API and users will be able to understand what it does.
Lets get it in, it is a huge improvement to the Chinese documentation and needed too.

bq. I can come back around later and do a more thorough job, but this isn't the only analyzer that needs some
documentation improvements!
Thanks for the attitude!

simon

> SmartChineseAnalyzer javadoc improvement
> ----------------------------------------
>
>                 Key: LUCENE-1722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1722
>             Project: Lucene - Java
>          Issue Type: Improvement
(Continue reading)

Simon Willnauer (JIRA | 1 Jul 12:34 2009
Picon

[jira] Resolved: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement


     [
https://issues.apache.org/jira/browse/LUCENE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-1722.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9

Just committed this javadoc improvment - thanks robert!

> SmartChineseAnalyzer javadoc improvement
> ----------------------------------------
>
>                 Key: LUCENE-1722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1722
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1722.txt
>
>
> Chinese -> English, and corrections to match reality (removes several javadoc warnings)

(Continue reading)

Simon Willnauer (JIRA | 1 Jul 13:18 2009
Picon

[jira] Updated: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue


     [
https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-1566:
------------------------------------

    Attachment: LUCENE-1566.patch

missed to enable asserts in testcase - updated patch to correctly assert offset / length 

> Large Lucene index can hit false OOM due to Sun JRE issue
> ---------------------------------------------------------
>
>                 Key: LUCENE-1566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1566
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Michael McCandless
>            Assignee: Simon Willnauer
>            Priority: Minor
>         Attachments: LUCENE-1566.patch, LUCENE-1566.patch
>
>
> This is not a Lucene issue, but I want to open this so future google
> diggers can more easily find it.
> There's this nasty bug in Sun's JRE:
>   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
(Continue reading)

Shai Erera (JIRA | 1 Jul 14:46 2009
Picon

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class


    [
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726037#action_12726037
] 

Shai Erera commented on LUCENE-1720:
------------------------------------

# Can we move to call Thread.currentThread() only once per method? Currently stop() calls it 3 times,
start() 2 and checkTimeoutIsThisThread()
# Can we change TimeoutThread to just wait() on timeLimitedThreads, instead of wait(1000)? I think it's
enough to rely on notify()? Otherwise, if my system is idle, this thread will wake up every second for nothing.
# Maybe change checkTimeoutIsThisThread to do "if(timedOutThreads.remove(Thread.currentThread())
!= null)" and delete the next line? If a thread has timed out, there's no need to look it up in the map twice.

Question - TimeoutThread checks firstAnticipatedTimeout vs. the current time and if it has exceeded, it
adds firstAnticipatedThreadToFail to timedoutThreads. I think it will fail in the following scenario:
* Thread 1 start an activity w/ time 100 (the expected exceeded time).
* Thread 2 start an activity w/ time 150.
* Thread 1 is stuck somewhere.
* TimeoutThread checks the current time against firstAnticipatedTimeout and adds
firstAnticipatedThreadToFail to timedOutThreads.
* Thread 2 checks for timeout, but timedOutThreads does not contain it. Therefore it continues its execution.

If a thread is stuck, the rest of the threads should not fail on their "timeout exceeded" checks. How about if
when TimeoutThread detects the first timeout has reached it will: (1) add that thread to the
timedOutThreads Set and (2) set "first timeout" to be the next in the Map/List. I think we'll need an
additional LinkedList or something, so that start(), stop(), check() and TimeoutThread.run() will
work properly, but that shouldn't be too complicated.

(Continue reading)

Mark Miller | 1 Jul 15:13 2009
Picon

Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

Hows the progress here guys? I have 2 or 3 issues that relate to this, and I really don't want to commit/finish them until this is done ...

Robert Muir (JIRA | 1 Jul 15:26 2009
Picon

[jira] Commented: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement


    [
https://issues.apache.org/jira/browse/LUCENE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726053#action_12726053
] 

Robert Muir commented on LUCENE-1722:
-------------------------------------

Simon, thanks. It was mentioned on the mailing list that perhaps this analyzer might be moved in the future
(since the datafiles cause analyzers.jar to be very large).

So, maybe at the same time when/if this is done the files could be reorganized in a way that allows a lot of
these internal classes to be marked package private.

> SmartChineseAnalyzer javadoc improvement
> ----------------------------------------
>
>                 Key: LUCENE-1722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1722
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1722.txt
>
>
> Chinese -> English, and corrections to match reality (removes several javadoc warnings)

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Gmane