Paul Elschot (JIRA | 1 Sep 2007 22:21
Picon
Favicon

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet


    [
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524301
] 

Paul Elschot commented on LUCENE-584:
-------------------------------------

Another way to decouple from BitSet would be to keep introduce a new superclass of Filter that only has an
abstract getMatcher() method, and to add an implementation of that method in the current Filter class.
That would boil down to the current patch with two classes renamed:
Filter ->  new class with abstract getMatcher() method.
BitSetFilter -> Filter.

This would avoid all backward compatibility issues, except for the unlikely case in which a getMatcher()
method is already implemented in an existing subclass of Filter.
Also, to take advantage of the independence of BitSet in other implementations, only this new class would
need to be used.
The only disadvantage I can see is that Filter is not renamed to BitSetFilter, which it actually is. But that
can be fixed by making the javadoc of Filter explicit about the use of BitSet.

For the lucene core and some of the contrib, this would mean that it would move to this new superclass of
Filter. Again, I don't expect backward compatibility issues there.

Does anyone see any problems with this approach?
When not, what name should this new superclass of Filter have? I'm thinking of MatchFilter, any other suggestions?

> Decouple Filter from BitSet
> ---------------------------
>
(Continue reading)

Yonik Seeley | 3 Sep 2007 16:35
Picon
Favicon

Re: Optimize and internal document order

On 8/31/07, Doug Cutting <cutting <at> apache.org> wrote:
> If each document has an indexed id field in both indexes, then you could
> simply use a FieldCache of that id field in each index to determine the
> mapping.

I've thought about this approach, but I think it has some scalability issues...
It seems like the re-mapped ids from the secondary index would be
out-of-order, and thus would require caching & sorting of both ids and
scores from that index.

-Yonik
Andrzej Bialecki | 3 Sep 2007 20:16

Re: Optimize and internal document order

Yonik Seeley wrote:
> On 8/31/07, Doug Cutting <cutting <at> apache.org> wrote:
>> If each document has an indexed id field in both indexes, then you could
>> simply use a FieldCache of that id field in each index to determine the
>> mapping.
> 
> I've thought about this approach, but I think it has some scalability issues...
> It seems like the re-mapped ids from the secondary index would be
> out-of-order, and thus would require caching & sorting of both ids and
> scores from that index.

Thanks for the input - for now I gave up on this, after discovering that 
I would have no way to ensure in skipTo() that document id-s are 
monotonically increasing (which seems to be a part of the contract).

--

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com
Nat (JIRA | 4 Sep 2007 13:42
Picon
Favicon

[jira] Commented: (LUCENE-743) IndexReader.reopen()


    [
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524694
] 

Nat commented on LUCENE-743:
----------------------------

Please also consider making an option where the reopen can be automated (i.e. when the index is updated)
instead of having to call it explicitly. Thread safety should be taken into account as well.

> IndexReader.reopen()
> --------------------
>
>                 Key: LUCENE-743
>                 URL: https://issues.apache.org/jira/browse/LUCENE-743
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Otis Gospodnetic
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: IndexReaderUtils.java, lucene-743.patch, lucene-743.patch, lucene-743.patch,
MyMultiReader.java, MySegmentReader.java
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes
(this was easier for him to implement, but should probably be folded into the core, if this looks good).
(Continue reading)

Paul Elschot | 4 Sep 2007 20:22
Picon
Picon
Favicon

Possible thread safety problem in CachingWrapperFilter


I'm trying to change the CachingWrapperFilter class to cache a
SortedVIntList for LUCENE-584. That is progressing nicely, but
I found this snippet at the beginning of the current
CachingWrapperFilter.bits() method:

    if (cache == null) {
      cache = new WeakHashMap();
    }

I think the initial snippet is not thread safe and might result
in two threads initializing this cache to different objects,
possibly conflicting with the cache accesses after that:

synchronized (cache) { ... cache.get(...); } 
...
synchronized (cache) { cache.put(...); } 

Would this be safe to initialize the cache:

synchronized(this) {
    if (cache == null) {
      cache = new WeakHashMap();
    }
}

and should the cache accesses also use synchronized(this) ?

Regards,
Paul Elschot
(Continue reading)

Michael McCandless (JIRA | 4 Sep 2007 20:28
Picon
Favicon

[jira] Created: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

IndexWriter.updateDocument is no longer atomic
----------------------------------------------

                 Key: LUCENE-992
                 URL: https://issues.apache.org/jira/browse/LUCENE-992
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.2
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.3

Spinoff from LUCENE-847.

Ning caught that as of LUCENE-843, we lost the atomicity of the delete
+ add in IndexWriter.updateDocument.

Ning suggested a simple fix: move the buffered deletes into
DocumentsWriter and let it do the delete + add atomically.  This has a
nice side effect of also consolidating the "time to flush" logic in
DocumentsWriter.

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Michael McCandless (JIRA | 4 Sep 2007 20:34
Picon
Favicon

[jira] Updated: (LUCENE-992) IndexWriter.updateDocument is no longer atomic


     [
https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-992:
--------------------------------------

    Attachment: LUCENE-992.patch

Attached patch.

I added a unit test that runs 2 indexing threads (calling
updateDocument) and 2 reader threads and asserts in the reader threads
that the number of documents never changes.

I also slightly changed the exception semantics in IndexWriter:
previously if a disk full (or other) exception was hit when flushing
the buffered docs, the buffered deletes were retained but the
partially flushed buffered docs (if any) were discarded.  I think this
was actually a bug because the buffered deletes must also be discarded
since they refer to document numbers that are no longer valid.  So I
changed it to also clear buffered deletes on exception, and had to
change one unit test (TestIndexWriterDelete) to match this.

> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
(Continue reading)

Michael McCandless (JIRA | 4 Sep 2007 20:34
Picon
Favicon

[jira] Updated: (LUCENE-992) IndexWriter.updateDocument is no longer atomic


     [
https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-992:
--------------------------------------

    Lucene Fields: [New, Patch Available]  (was: [New])

> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-992.patch
>
>
> Spinoff from LUCENE-847.
> Ning caught that as of LUCENE-843, we lost the atomicity of the delete
> + add in IndexWriter.updateDocument.
> Ning suggested a simple fix: move the buffered deletes into
(Continue reading)

Chris Hostetter | 4 Sep 2007 21:03

Re: Possible thread safety problem in CachingWrapperFilter


:     if (cache == null) {
:       cache = new WeakHashMap();
:     }
: 
: I think the initial snippet is not thread safe and might result
: in two threads initializing this cache to different objects,
: possibly conflicting with the cache accesses after that:

i believe you are write ... if Thread A evaluates the "if (cache == null)" 
op and then Thread B is given priority and executes the entire method, 
when Thread A resumes it will blow away the old cache instance (and all of 
B's hard work)

I suspect there wasn't a synchro block arround that bit of code because 
synchronizing on an expression that evaluates to null returns an NPE 

: Would this be safe to initialize the cache:
: 
: synchronized(this) {
:     if (cache == null) {
:       cache = new WeakHashMap();
:     }
: }

for the life of me i can't imaging why "cache = new WeakHashMap();" isn't 
just in the constructor.  then it's garunteed to only execute once.

: and should the cache accesses also use synchronized(this) ?

(Continue reading)

Paul Elschot | 4 Sep 2007 22:00
Picon
Picon
Favicon

Re: Possible thread safety problem in CachingWrapperFilter


On Tuesday 04 September 2007 21:03, Chris Hostetter wrote:
> 
> :     if (cache == null) {
> :       cache = new WeakHashMap();
> :     }
> : 
> : I think the initial snippet is not thread safe and might result
> : in two threads initializing this cache to different objects,
> : possibly conflicting with the cache accesses after that:
> 
> i believe you are write ... if Thread A evaluates the "if (cache == null)" 
> op and then Thread B is given priority and executes the entire method, 
> when Thread A resumes it will blow away the old cache instance (and all of 
> B's hard work)
> 
> I suspect there wasn't a synchro block arround that bit of code because 
> synchronizing on an expression that evaluates to null returns an NPE 
> 
> : Would this be safe to initialize the cache:
> : 
> : synchronized(this) {
> :     if (cache == null) {
> :       cache = new WeakHashMap();
> :     }
> : }
> 
> for the life of me i can't imaging why "cache = new WeakHashMap();" isn't 
> just in the constructor.  then it's garunteed to only execute once.

(Continue reading)


Gmane