Marvin Humphrey | 1 Dec 02:10 2010

Re: deprecating Versions

On Mon, Nov 29, 2010 at 05:34:27AM -0500, Robert Muir wrote:
> Is it somehow possible i could convince everyone that all the analyzers we
> provide are simply examples?  This way we could really make this a bit more
> reasonable and clean up a lot of stuff.

I understand what you're getting at.  We don't really expect people to fork an
analyzer code base, though -- so we need to draw a line between e.g. the code
that implements StopFilter and stoplist content.   We want the low-level code
to be part of the library, but maybe we want stoplist content to be considered
example code.

> Seems like we really want to move towards a more declarative model where
> these are just config files... so only then it will ok for us to change them
> because they suddenly aren't suffixed with .java?!

Consider how this might work with e.g. RussianAnalyzer.  The
declaratively-expressed sample analyzer config could contain a hard-coded list
of Russian stop words, and as this hard-coded stoplist would travel with the
index in a config file, there would be no index compatibility problems upon
upgrading Lucene.  The stoplist in the sample config could change, even on
bugfix releases.

Config file syntax would potentially be affected by a Lucene upgrade, but that
doesn't affect index content and maintaining back compat is straightforward.

Things are more difficult with versioning e.g. stemmers, but I think the
stoplist example illustrates the potential of declarative analyzer
specification.  Maybe specifying Version in a sample file and dispatching to
different revs of a Snowball stemmer is less painful than forcing a user to
figure out Version from API documentation?
(Continue reading)

Chris Male (JIRA | 1 Dec 02:47 2010
Picon

[jira] Commented: (LUCENE-2139) Cleanup and Improvement of Spatial Contrib


    [
https://issues.apache.org/jira/browse/LUCENE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965531#action_12965531
] 

Chris Male commented on LUCENE-2139:
------------------------------------

So we're good with closing? I'll close all associated issues too and open a new one for the
deprecation/nuking/re-birth work.

> Cleanup and Improvement of Spatial Contrib
> ------------------------------------------
>
>                 Key: LUCENE-2139
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2139
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/spatial
>    Affects Versions: 3.1
>            Reporter: Chris Male
>            Assignee: Simon Willnauer
>         Attachments: LUCENE-2139-Java5Only.patch, LUCENE-2139-svnScript.sh, LUCENE-2139.patch
>
>
> The current spatial contrib can be improved by adding documentation, tests, removing unused classes and
code, repackaging the classes and improving the performance of the distance filtering.  The latter will
incorporate the multi-threaded functionality introduced in LUCENE-1732.  
> Other improvements involve adding better support for different distance units, different distance
calculators and different data formats (whether it be lat/long fields, geohashes, or something else in
(Continue reading)

Grant Ingersoll (JIRA | 1 Dec 02:51 2010
Picon

[jira] Commented: (SOLR-2241) Upgrade to Tika 0.8


    [
https://issues.apache.org/jira/browse/SOLR-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965532#action_12965532
] 

Grant Ingersoll commented on SOLR-2241:
---------------------------------------

Trunk: Committed revision 1040815.
Branch 3.x: Committed revision 1040852.

> Upgrade to Tika 0.8
> -------------------
>
>                 Key: SOLR-2241
>                 URL: https://issues.apache.org/jira/browse/SOLR-2241
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 3.1, 4.0
>
>
> as the title says

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
(Continue reading)

Grant Ingersoll (JIRA | 1 Dec 02:51 2010
Picon

[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer


    [
https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965533#action_12965533
] 

Grant Ingersoll commented on SOLR-2088:
---------------------------------------

Should be resolved via SOLR-2241.

> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>            Assignee: Grant Ingersoll
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
>     [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
(Continue reading)

Robert Muir (JIRA | 1 Dec 02:59 2010
Picon

[jira] Resolved: (SOLR-2088) contrib/extraction fails on a turkish computer


     [
https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-2088.
-------------------------------

    Resolution: Fixed

with tika 0.8, this is no longer a problem... html/pdf seems to work fine (the tests pass)

> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>            Assignee: Grant Ingersoll
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
(Continue reading)

Apache Hudson Server | 1 Dec 03:05 2010
Picon

Lucene-Solr-tests-only-trunk - Build # 2039 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2039/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
	at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:466)
	at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92)
	at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)

Build Log (for compile errors):
[...truncated 8709 lines...]
Yonik Seeley (JIRA | 1 Dec 03:06 2010
Picon

[jira] Commented: (LUCENE-2139) Cleanup and Improvement of Spatial Contrib


    [
https://issues.apache.org/jira/browse/LUCENE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965540#action_12965540
] 

Yonik Seeley commented on LUCENE-2139:
--------------------------------------

bq. As far as I understand the spatial search is done by two range queries ANDed

Yep - but they are normally tri-range (i.e. Lucene Numeric) ranges, so pretty efficient anyway.
From what I saw of the existing tier code, I can't say that it would have been more efficient... and I can say
that at least for some spatial queries, it would have been less efficient (since it always seemed to use one
tier level for a query, not multiple).

> Cleanup and Improvement of Spatial Contrib
> ------------------------------------------
>
>                 Key: LUCENE-2139
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2139
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/spatial
>    Affects Versions: 3.1
>            Reporter: Chris Male
>            Assignee: Simon Willnauer
>         Attachments: LUCENE-2139-Java5Only.patch, LUCENE-2139-svnScript.sh, LUCENE-2139.patch
>
>
> The current spatial contrib can be improved by adding documentation, tests, removing unused classes and
(Continue reading)

Chris Male (JIRA | 1 Dec 03:10 2010
Picon

[jira] Commented: (LUCENE-2139) Cleanup and Improvement of Spatial Contrib


    [
https://issues.apache.org/jira/browse/LUCENE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965541#action_12965541
] 

Chris Male commented on LUCENE-2139:
------------------------------------

I'd say that thats what we want to determine some way as part of a redo of the spatial code.  Maybe we need to
include some benchmarking functionality.

> Cleanup and Improvement of Spatial Contrib
> ------------------------------------------
>
>                 Key: LUCENE-2139
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2139
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/spatial
>    Affects Versions: 3.1
>            Reporter: Chris Male
>            Assignee: Simon Willnauer
>         Attachments: LUCENE-2139-Java5Only.patch, LUCENE-2139-svnScript.sh, LUCENE-2139.patch
>
>
> The current spatial contrib can be improved by adding documentation, tests, removing unused classes and
code, repackaging the classes and improving the performance of the distance filtering.  The latter will
incorporate the multi-threaded functionality introduced in LUCENE-1732.  
> Other improvements involve adding better support for different distance units, different distance
calculators and different data formats (whether it be lat/long fields, geohashes, or something else in
(Continue reading)

Shai Erera (JIRA | 1 Dec 05:53 2010
Picon

[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory


     [
https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2779:
-------------------------------

    Attachment: LUCENE-2779.patch

Patch w/ the latest code + a typo fix. I will commit it later today.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch,
LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in
addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while
(Continue reading)

Ibrahim (JIRA | 1 Dec 06:31 2010
Picon

[jira] Created: (LUCENE-2786) no need for LowerCaseFilter from ArabicAnalyzer

no need for LowerCaseFilter from ArabicAnalyzer
-----------------------------------------------

                 Key: LUCENE-2786
                 URL: https://issues.apache.org/jira/browse/LUCENE-2786
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/analyzers
    Affects Versions: 3.0.2
         Environment: All
            Reporter: Ibrahim
            Priority: Trivial

No need for this line 171:
result = new LowerCaseFilter(result);

in ArabicAnalyzer

simply because there is no lower case or upper case in Arabic language. it is totally not related to each other.

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Gmane