Picon

[jira] [Commented] (SOLR-2880) Investigate adding an overseer that can assign shards, later do re-balancing, etc


    [
https://issues.apache.org/jira/browse/SOLR-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160500#comment-13160500
] 

Mark Miller commented on SOLR-2880:
-----------------------------------

Would it be more convenient to store whats on the nodes in node_states and node_assigments as human
readable? Like the ZkNodeProps?

I actually just noticed we use the same ZKNodeProps for the current leader zk node as we use for the node in the
cluster state - not a problem for info that doesn't change, but now that we store the "active,replicating"
state there, as well as other properties that could be dynamic, it ends up with stale data as those
properties change.

> Investigate adding an overseer that can assign shards, later do re-balancing, etc
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-2880
>                 URL: https://issues.apache.org/jira/browse/SOLR-2880
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.0
>
>         Attachments: SOLR-2880-merge-elections.patch, SOLR-2880.patch
>
(Continue reading)

Picon

[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0


    [
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160515#comment-13160515
] 

Uwe Schindler commented on LUCENE-3606:
---------------------------------------

OK, I will work on this as soon as I can (next weekend). I will be gald to remove the copy-on-write setNorm
stuff in Lucene40 codec and make Lucene3x codec completely read-only (only reading the newest norm
file). I hope Robert will possibly help me :-)

> Make IndexReader really read-only in Lucene 4.0
> -----------------------------------------------
>
>                 Key: LUCENE-3606
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3606
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>
> As we change API completely in Lucene 4.0 we are also free to remove read-write access and commits from
IndexReader. This code is so hairy and buggy (as investigated by Robert and Mike today) when you work on
SegmentReader level but forget to flush in the DirectoryReader, so its better to really make
IndexReaders readonly.
> Currently with IndexReader you can do things like:
> - delete/undelete Documents -> Can be done by with IndexWriter, too (using deleteByQuery)
> - change norms -> this is a bad idea in general, but when we remove norms at all and replace by DocValues this
(Continue reading)

Picon

[jira] [Assigned] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0


     [
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-3606:
-------------------------------------

    Assignee: Uwe Schindler

> Make IndexReader really read-only in Lucene 4.0
> -----------------------------------------------
>
>                 Key: LUCENE-3606
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3606
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>
> As we change API completely in Lucene 4.0 we are also free to remove read-write access and commits from
IndexReader. This code is so hairy and buggy (as investigated by Robert and Mike today) when you work on
SegmentReader level but forget to flush in the DirectoryReader, so its better to really make
IndexReaders readonly.
> Currently with IndexReader you can do things like:
> - delete/undelete Documents -> Can be done by with IndexWriter, too (using deleteByQuery)
> - change norms -> this is a bad idea in general, but when we remove norms at all and replace by DocValues this
is obsolete already. Changing DocValues should also be done using IndexWriter in trunk (once it is ready)

(Continue reading)

Picon

[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0


    [
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160536#comment-13160536
] 

Robert Muir commented on LUCENE-3606:
-------------------------------------

Hi, I will help too. I think norms itself is a pretty big project to clean up and its a good one to do first.

We don't have to do it this way, but here is my idea of a way we could do it in commitable steps:
# remove the setNorm first from IR, and fix all tests.
# rename NormsWriter to NormsConsumer, rote refactor of norms i/o code into codec as NormsFormat (yes with
just one default, and just reads whole byte[])
# remove IndexFileNames constant and default implementation handles files(), including .sNNN hairiness
# create SimpleText implementation

Then even more cleanups:
# split Default implementation to Preflex (with all hairiness like .sNNN) and Lucene40 (clean implementation)
# clean up 'behind the scenes' api, e.g. NormsFormat presents docvalues API (hardcoded at fixed bytes),
SegmentReader does getArray(). IndexReader still returns just byte[]
# finally, "holy grail" where similarities can declare the normalization factor(s) they need, using
byte/float/int whatever, and its all unified with the docvalues api. IndexReader.norms() maybe goes
away here, and maybe NormsFormat too.

                
> Make IndexReader really read-only in Lucene 4.0
> -----------------------------------------------
>
>                 Key: LUCENE-3606
(Continue reading)

Picon

[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0


    [
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160559#comment-13160559
] 

Robert Muir commented on LUCENE-3606:
-------------------------------------

{quote}
finally, "holy grail" where similarities can declare the normalization factor(s) they need, using
byte/float/int whatever, and its all unified with the docvalues api. IndexReader.norms() maybe goes
away here, and maybe NormsFormat too.
{quote}

Thinking about this: a clean way to do it would be for Similarity to get a new method:
{code}
ValueType getValueType();
{code}

and we would change:
{code}
byte computeNorm(FieldInvertState state);
{code}
to:
{code}
void computeNorm(FieldInvertState state, PerDocFieldValues norm);
{code}

Sims that want to encode multiple index-time scoring factors separately 
could just use BYTES_FIXED_STRAIGHT. This should be only for some rare
(Continue reading)

Picon

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should


    [
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160576#comment-13160576
] 

Mike Sokolov commented on SOLR-2921:
------------------------------------

I spoke hastily, and it's true that stemmers are different from those other multi-token things.  It would be
kind of nice if it were possible to have a query for "do?s" actually match the a document containing "dogs",
even when matching against a stemmed field, but I don't see how to do it without breaking all kinds of other
things.  Consider how messed up range queries would get.  [dogs TO *] would match doge, doggone,   and other
words in [dog TO dogs] which would be totally counterintuitive.

> Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2921
>                 URL: https://issues.apache.org/jira/browse/SOLR-2921
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>         Environment: All
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>
> SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically
assemble a "multiterm" analyzer that does the right thing vis-a-vis transforming the individual terms
(Continue reading)

Picon

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should


    [
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160593#comment-13160593
] 

Robert Muir commented on SOLR-2921:
-----------------------------------

well Erick i think the ones you listed here are ok.

There are cases where they won't work correctly, but trying to do
multitermqueries with mappingcharfilter and asciifolding
filter are already problematic (eg ? won't match œ because 
its now 'oe').

Personally i think this is fine, but we should document
that things don't work correctly all the time, and we 
should not make changes to analysis components to try 
to make them cope  with multiterm queries syntax or 
anything (this would be bad design, it turns them into 
queryparsers).

If the user cares about the corner cases, then they just
specify the chain.

> Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2921
>                 URL: https://issues.apache.org/jira/browse/SOLR-2921
(Continue reading)

Picon

[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements


    [
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160607#comment-13160607
] 

Grant Ingersoll commented on SOLR-1726:
---------------------------------------

Hi Manoj,

This looks OK as a start.  Would be nice to have tests to go with it.  

Why the overriding of getTotalHits on the TopScoreDocCollector?  I don't think returning collectedHits
is the right thing to do there.

Also, you should be able to avoid an extra Collector create call at:
{code}
        topCollector = TopScoreDocCollector.create(len, true);
        //Issue 1726 Start
        if(cmd.getScoreDoc() != null)
        {
        	topCollector = TopScoreDocCollector.create(len, cmd.getScoreDoc(), true); //create the Collector
with InOrderPagingCollector
        }

{code}

But that is easy enough to fix.

                
(Continue reading)

Picon

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should


    [
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160612#comment-13160612
] 

Erick Erickson commented on SOLR-2921:
--------------------------------------

Mike:

stemmers - not going to make them MultiTermAware. No way. No how. Not on my watch, one succinct example and
I'm convinced.

The beauty of the way Yonik and Robert directed this is that we can take care of the 80% case, not provide
things that are *that* surprising and still have all the flexibility available to those who really need
it. As Robert says, if they really want some "interesting" behavior, they can specify the complete chain.

Robert:

I guess I'm at a loss as to how to write tests for the various filters and tokenizers I listed, which is why I'm
reluctant to just make them MultTermAwareComponents. Do you have any suggestions as to how I could get
tests? I had enough surprises when I ran the tests in English that I'm reluctant to just plow ahead. As far as
I understand, Arabic is caseless for instance.

I totally agree with your point that making the analysis components cope with syntax is evil. Not going
there either.

Maybe the right action is to wait for someone to volunteer to be the guinea pig for the various filters, I
suppose we could advertise for volunteers...

(Continue reading)

Picon

[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements


    [
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160620#comment-13160620
] 

Manojkumar Rangasamy Kannadasan commented on SOLR-1726:
-------------------------------------------------------

Hi Grant, thanks for your comments. Regarding the collectedHits, if there are 4 docs as results and if we
want to return only bottom 2 by giving appropriate pageScore and pageDoc, the expected result is to return
only 2 docs as results. But totalHits returns all the 4 docs. Thats the reason i used collectedHits.
Kindly correct me if my understanding is wrong.

> Deep Paging and Large Results Improvements
> ------------------------------------------
>
>                 Key: SOLR-1726
>                 URL: https://issues.apache.org/jira/browse/SOLR-1726
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java,
SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java
>
>
> There are possibly ways to improve collections of "deep paging" by passing Solr/Lucene more information
(Continue reading)


Gmane