Michael Busch (JIRA | 1 Oct 08:09 2007
Picon

[jira] Created: (LUCENE-1012) Problems with maxMergeDocs parameter

Problems with maxMergeDocs parameter
------------------------------------

                 Key: LUCENE-1012
                 URL: https://issues.apache.org/jira/browse/LUCENE-1012
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
            Reporter: Michael Busch
            Priority: Minor
             Fix For: 2.3

I found two possible problems regarding IndexWriter's maxMergeDocs value. I'm using the following code
to test maxMergeDocs:

{code:java} 
  public void testMaxMergeDocs() throws IOException {
    final int maxMergeDocs = 50;
    final int numSegments = 40;

    MockRAMDirectory dir = new MockRAMDirectory();
    IndexWriter writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), true);      
    writer.setMergePolicy(new LogDocMergePolicy());
    writer.setMaxMergeDocs(maxMergeDocs);

    Document doc = new Document();
    doc.add(new Field("field", "aaa", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
    for (int i = 0; i < numSegments * maxMergeDocs; i++) {
      writer.addDocument(doc);
      //writer.flush();      // uncomment to avoid the DocumentsWriter bug
(Continue reading)

Michael McCandless (JIRA | 1 Oct 09:42 2007
Picon

[jira] Commented: (LUCENE-1011) Two or more writers over NFS can cause index corruption


    [
https://issues.apache.org/jira/browse/LUCENE-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531428
] 

Michael McCandless commented on LUCENE-1011:
--------------------------------------------

> i'm not an expert on file Locking (either in Lucene, or in the JVM,
> or any OSes) but i have to wonder if the problems you are seeing are
> inherent in the Java FileLock APIs, or if they only manifest in
> specific implementations (ie: certain JVM impls, certain
> filesystems, certain combinations of NFS client/server, etc...)

I'm no expert either, and I continue to be rather shocked each time I
learn more!

> if we can say "NativeFSLockFactory uses the Java FileLock API to
> provide locking. FileLock known to be buggy in the following
> situations: .... " then we've done all we can do, correct?

I agree, I think this is exactly what we should do.  I'll update the
javadoc for NativeFSLockFactory with this statement.

> Two or more writers over NFS can cause index corruption
> -------------------------------------------------------
>
>                 Key: LUCENE-1011
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1011
>             Project: Lucene - Java
(Continue reading)

Michael McCandless (JIRA | 1 Oct 09:58 2007
Picon

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance


    [
https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531433
] 

Michael McCandless commented on LUCENE-994:
-------------------------------------------

> While trying Solr with the latest Lucene, I ran into this back-incompatibility:
> Caused by: java.lang.IllegalArgumentException: this method can only be called when the merge policy is LogDocMergePolicy
> at org.apache.lucene.index.IndexWriter.getLogDocMergePolicy(IndexWriter.java:316)
> at org.apache.lucene.index.IndexWriter.setMaxMergeDocs(IndexWriter.java:768)
>
> It's not an issue at all for Solr - we'll fix things up when we
> officially upgrade Lucene versions, but it does seem like it might
> affect a number of apps that try and just drop in a new lucene
> jar. Thoughts?

Hmm, good catch.

This should only happen when "setMaxMergeDocs" is called (this is the
only method that requires a LogDocMergePolicy).  I think we have
various options:

  1. Leave things as is and put up-front comment in the release saying
     you could either switch to LogDocMergePolicy, or, use
     "setMaxMergeMB" on the default LogByteSizeMergePolicy, instead.
     Also put details in the javadocs for this method explaining these
     options.

(Continue reading)

Michael McCandless (JIRA | 1 Oct 10:02 2007
Picon

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance


    [
https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531434
] 

Michael McCandless commented on LUCENE-994:
-------------------------------------------

> It was my impression that this Lucene release would be unusual in
> that you shouldn't just drop the jar without first making sure you
> are in compliance with the new changes? Since some apps are going to
> break no matter what (few they may be) perhaps you just make a big
> fuss about possible incompatible changes?

I *think* this release should in fact "drop in" for most apps.  The
only known case where there is non-backwards compatibility (besides
this setMaxMergeDocs issue) is users of ParallelReader, I think?  I
think Lucene 3.0 is when we are "allowed" to remove deprecated APIs,
switch to Java 1.5, etc.

> Change defaults in IndexWriter to maximize "out of the box" performance
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-994
>                 URL: https://issues.apache.org/jira/browse/LUCENE-994
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
(Continue reading)

Michael McCandless (JIRA | 1 Oct 10:12 2007
Picon

[jira] Created: (LUCENE-1013) IndexWriter.setMaxMergeDocs gives non-backwards-compatible exception "out of the box"

IndexWriter.setMaxMergeDocs gives non-backwards-compatible exception "out of the box"
-------------------------------------------------------------------------------------

                 Key: LUCENE-1013
                 URL: https://issues.apache.org/jira/browse/LUCENE-1013
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.3
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 2.3

Yonik hit this (see details in LUCENE-994): because we have switched
to LogByteSizeMergePolicy by default in IndexWriter, which uses MB to
limit max size of merges (setMaxMergeMB), when an existing app calls
setMaxMergeDocs (or getMaxMergeDocs) it will hit an
IllegalArgumentException on dropping in the new JAR.

I think the simplest solution is to fix LogByteSizeMergePolicy to
allow setting of the max by either MB or by doc count, just like how
in LUCENE-1007 allows flushing by either MB or docCount or both.

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Michael McCandless (JIRA | 1 Oct 10:18 2007
Picon

[jira] Created: (LUCENE-1014) IndexWriter.optimize() does not respect maxMergeDocs

IndexWriter.optimize() does not respect maxMergeDocs
----------------------------------------------------

                 Key: LUCENE-1014
                 URL: https://issues.apache.org/jira/browse/LUCENE-1014
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.2, 2.1, 2.0.0, 1.9, 2.0.1, 2.3
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.3

Similar to LUCENE-1012, IndexWriter.optimize() does not respect
maxMergeDocs: it always merges the index down to one segment.

I don't think we should change this for the existing optimize() since
this would be a change in behavior.  I think instead in doing
LUCENE-982 (adding IndexWriter.optimize(int maxNumSegments) method) we
can have it respect maxMergeDocs.

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Michael McCandless (JIRA | 1 Oct 10:52 2007
Picon

[jira] Commented: (LUCENE-1012) Problems with maxMergeDocs parameter


    [
https://issues.apache.org/jira/browse/LUCENE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531440
] 

Michael McCandless commented on LUCENE-1012:
--------------------------------------------

> - It seems that DocumentsWriter does not obey the maxMergeDocs
>   parameter. If I don't flush manually, then the index only contains
>   one segment at the end and the test fails.

This bug actually predates DocumentsWriter: the flushing logic has
never respected maxMergeDocs.  I think normally maxMergeDocs is far
larger than maxBufferedDocs.

To fix this we could change the flushing logic to include "# buffered
docs > maxMergeDocs" as one of its flush criteria, if the current
merge policy is a LogMergePolicy.

> - If I flush manually after each addDocument() call, then the index
>   contains more segments. But still, there are segments that contain 
>   more docs than maxMergeDocs, e. g. 55 vs. 50.

This behavior also predates the recent changes (MergePolicy, etc.), eg
the test fails on 2.1 if you flush every 6 docs (whenever "0 == i%6").

Really the current approach is better described as "any segment with
doc count greater than maxMergeDocs will not be merged".

(Continue reading)

Yonik Seeley (JIRA | 1 Oct 15:20 2007
Picon

[jira] Commented: (LUCENE-1013) IndexWriter.setMaxMergeDocs gives non-backwards-compatible exception "out of the box"


    [
https://issues.apache.org/jira/browse/LUCENE-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531509
] 

Yonik Seeley commented on LUCENE-1013:
--------------------------------------

> fix LogByteSizeMergePolicy to allow setting of the max by either MB or by doc count
+1

> IndexWriter.setMaxMergeDocs gives non-backwards-compatible exception "out of the box"
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>
> Yonik hit this (see details in LUCENE-994): because we have switched
> to LogByteSizeMergePolicy by default in IndexWriter, which uses MB to
> limit max size of merges (setMaxMergeMB), when an existing app calls
> setMaxMergeDocs (or getMaxMergeDocs) it will hit an
> IllegalArgumentException on dropping in the new JAR.
(Continue reading)

Yonik Seeley (JIRA | 1 Oct 15:26 2007
Picon

[jira] Commented: (LUCENE-1012) Problems with maxMergeDocs parameter


    [
https://issues.apache.org/jira/browse/LUCENE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531510
] 

Yonik Seeley commented on LUCENE-1012:
--------------------------------------

> We could just fix the javadocs to match the current approach?
That sounds like the right approach.

> Problems with maxMergeDocs parameter
> ------------------------------------
>
>                 Key: LUCENE-1012
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1012
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael Busch
>            Priority: Minor
>             Fix For: 2.3
>
>
> I found two possible problems regarding IndexWriter's maxMergeDocs value. I'm using the following code
to test maxMergeDocs:
> {code:java} 
>   public void testMaxMergeDocs() throws IOException {
>     final int maxMergeDocs = 50;
>     final int numSegments = 40;
(Continue reading)

Ning Li (JIRA | 1 Oct 16:04 2007
Picon

[jira] Commented: (LUCENE-1007) Flexibility to turn on/off any flush triggers


    [
https://issues.apache.org/jira/browse/LUCENE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531513
] 

Ning Li commented on LUCENE-1007:
---------------------------------

One more thing about the approximation of actual bytes used for buffered delete term: just remember
Integer.SIZE returns the number of bits used, should convert it to number of bytes.

> Flexibility to turn on/off any flush triggers
> ---------------------------------------------
>
>                 Key: LUCENE-1007
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1007
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Ning Li
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1007.patch, LUCENE-1007.take2.patch, LUCENE-1007.take3.patch
>
>
> See discussion at http://www.gossamer-threads.com/lists/lucene/java-dev/53186
> Provide the flexibility to turn on/off any flush triggers - ramBufferSize, maxBufferedDocs and
maxBufferedDeleteTerms. One of ramBufferSize and maxBufferedDocs must be enabled.
(Continue reading)


Gmane