John Wang | 2 May 00:38 2010
Picon

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

We are seeing this issue as well in your production. (using Zoie on top of lucene 2.9.1)

After some performance comparisons, we do NOT see performance gain with NIO, rather these nasty ClosedChannelExceptions.

I think the performance gains ppl are seeing with 2.9.1 can be due to many different things. From what we seen, they are not related to NIOFSDirectory.

Our solution is to avoid calling FSDirectory.open(), instead just call new SimpleFSDirectory(). Is this safe?

-John

On Fri, Jan 29, 2010 at 12:32 PM, Mark Miller <markrmiller <at> gmail.com> wrote:
Perhaps - one of the things they are supposed to be addressing is
extendability.

nio2 does have FileSystemProvider, which would actually allow you to
create a custom channel !

I have not dug in enough to know much more than that though.

*But*, another really interesting thing is that in Java 7,
FileDescriptors are ref counted ! (though users can't inc/dec).

But, FileInputStream and OutputStream have a new constructor that takes
a FileDescriptor.

So possibly, you could just make one that sits around to keep the
FileDescriptor valid, and get your channel off
FileInputStream/FileOutputStream?

And then if it goes down, make a new one using the FileDescriptor which
was not actually closed because there was a still a ref to it.

Possibly .... ;)

Michael McCandless wrote:
> Does anyone know if nio2 has improved this...?
>
> Mike
>
> On Fri, Jan 29, 2010 at 2:00 PM, Jason Rutherglen
> <jason.rutherglen <at> gmail.com> wrote:
>
>> Defaulting NIOFSDir could account for some of the recent speed
>> improvements users have been reporting in Lucene 2.9.  So removing it
>> as a default could reverse those and people could then report Lucene
>> 3.X has slowed...
>>
>> On Thu, Jan 28, 2010 at 5:24 AM, Michael McCandless
>> <lucene <at> mikemccandless.com> wrote:
>>
>>> Bummer.
>>>
>>> So the only viable workarounds are 1) don't use Thread.interrupt (nor,
>>> things like Future.cancel, which in turn use Thread.interrupt) with
>>> NIOFSDir, or 2) we fix NIOFSDir to reopen the channel AND the app must
>>> make a deletion policy that keeps a commit alive if any reader is
>>> using it.  Or, 3) don't use NIOFSDir!
>>>
>>> Mike
>>>
>>> On Thu, Jan 28, 2010 at 7:29 AM, Simon Willnauer
>>> <simon.willnauer <at> googlemail.com> wrote:
>>>
>>>> On Thu, Jan 28, 2010 at 12:43 PM, Michael McCandless
>>>> <lucene <at> mikemccandless.com> wrote:
>>>>
>>>>> On Thu, Jan 28, 2010 at 6:38 AM, Uwe Schindler <uwe <at> thetaphi.de> wrote:
>>>>>
>>>>>
>>>>>> So I checked the code of NIOFSIndexInput, my last comment was not really correct:
>>>>>> NIOFSIndexInput extends SimpleFSIndexInput and that opens the RAF. In the ctor RAF.getChannel() is called. The RAF keeps open until the file is closed (and also the channel).
>>>>>>
>>>>>> So it's really simple to fix in my opinion, just call getChannel() again on this exception. Because the RAF should still be open?
>>>>>>
>>>> Short answer:
>>>>  public final FileChannel getChannel() {
>>>>        synchronized (this) {
>>>>            if (channel == null)
>>>>                channel = FileChannelImpl.open(fd, true, rw, this);
>>>>            return channel;
>>>>        }
>>>>    }
>>>>
>>>> this is not gonna work I tried it before. The RandomAccessFile buffers
>>>> the channel!!
>>>>
>>>> simon
>>>>
>>>>> I think we need a definitive answer on what happens to the RAF when
>>>>> the FileChannel was closed by Thread.Interrupt.  Simon can you test
>>>>> this?
>>>>>
>>>>> Mike
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help <at> lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help <at> lucene.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
>>> For additional commands, e-mail: java-dev-help <at> lucene.apache.org
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
>> For additional commands, e-mail: java-dev-help <at> lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-dev-help <at> lucene.apache.org
>
>


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
For additional commands, e-mail: java-dev-help <at> lucene.apache.org


Yonik Seeley | 2 May 19:32 2010

BytesRef comparable

Any objections to making BytesRef comparable?  It would make it much
easier to use with containers that don't take comparators as
parameters.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague
Uwe Schindler | 5 May 00:03 2010
Picon

Lucene 3.x branch created

Hi all,

we worked hard this day and created the "stable" Lucene 3.x branch that will be released soon as version 3.1
and later 3.x branches. As soon as 3.1 is released a corresponding "branch_31" will be created from this
branch (and not trunk):

	https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

This was created as the last commit pre-flex and also contains a lot of merged revisions from post-flex
(like CharTermAttribute). This makes merging newer Analyzers/TokenStreams and so on easier. We also
fixed some bugs shortly before flex so the stable branch is now stable. I also refactored the change log.

Now the TODO is:

- Merge the rest of post-flex developments like lots of analyzer improvements until current trunk status.
This should ideally be done with a GUI tool that shows what was already been merged (these are lots of
revisions, see merge property. Only selected revisions are merged). Most commits came from rmuir, he
will also use TortoiseSVN (like I did for merge).
- All these merges move in CHANGES.txt from trunk to the branch (in trunk changes.txt) and also are added to
branch's changes.txt
- Do the same refactoring of Solr's changes.txt (I did not touch until now), its out of my scope - somebody
else should do this.

I also added Hudson build jobs for this branch. We have now:

	http://hudson.zones.apache.org/hudson/job/Lucene-3.x/
	http://hudson.zones.apache.org/hudson/job/Solr-3.x/

The development of trunk will continue as usual  <at> 

	https://svn.apache.org/repos/asf/lucene/dev/trunk

With Hudson jobs:

	http://hudson.zones.apache.org/hudson/job/Lucene-trunk/
	http://hudson.zones.apache.org/hudson/job/Solr-trunk/

!!! BUT !!!: It will be have no backwards compatibility, but still some revisions can be merged back (with
added backwards layer! On a case-by-case basis). Flex and flex-only features will not be ported back
(like automaton queries). This version will be released as 4.0 (this may also happen soon). This
development branch is for all new developments without any need to be backwards compatible. Even the
index format can change (and will). We will only provide a conversion tool that can convert indexes from
the last "branch_3x" up to this trunk (4.0) release, so they can be read later, but may not contain terms
with all current analyzers, so people need mostly reindexing. Older indexes will not be able to be read
natively without conversion first (with maybe loss of analyzer compatibility).

This index format conversion tool (has no name until now) will convert to the new flex format and may also
change the order of terms in TermsEnum to be native byte[] order (which is Unicode code point order and not
current UTF-16 order). Also numeric fields and collation keys may be converted to full 8 bit term format
(not yet decided), that are no longer UTF-16 terms.

We will also factor out all analyzers/tokenstreams, so trunk will only contain the abstract TokenStream
and Analyzer base classes with slightly changed API. All the actual analysis classes will be moved to modules.

Happy coding!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe <at> thetaphi.de
karl.wright | 24 May 15:10 2010
Picon

Solr updateRequestHandler and performance vs. atomicity

Hi all,
 
It seems to me that the “commit” logic in the Solr updateRequestHandler (or wherever the logic is actually located) conflates two different semantics.  One semantic is what you need to do to make the index process perform well.  The other semantic is guaranteed atomicity of document reception by Solr.
 
In particular, it would be nice to be able to post documents in such a way that you can guarantee that the document is permanently in Solr’s queue, safe in the event of a Solr restart, etc., even if the document has not yet been “committed”.
 
This issue came up in the LCF talk that I gave, and I initially thought that separating the two kinds of events would necessarily be an LCF change, but the more I thought about it the more I realized that other Solr indexing clients may also benefit from such a separation.
 
Does anyone agree?  Where should this logic properly live?
 
Thanks,
Karl
 
 
 
 
Peter Wolanin | 24 May 15:31 2010

Re: Solr updateRequestHandler and performance vs. atomicity

We us an autocommit with Solr and I've had this worry too - apparently
if you get a hard crash Solr will roll back the not-yet-committed
docs.

I don't think it's happened more than once in a year, but still possible.

-Peter

On Mon, May 24, 2010 at 9:10 AM,  <karl.wright <at> nokia.com> wrote:
> Hi all,
>
> It seems to me that the “commit” logic in the Solr updateRequestHandler (or
> wherever the logic is actually located) conflates two different semantics.
> One semantic is what you need to do to make the index process perform well.
> The other semantic is guaranteed atomicity of document reception by Solr.
>
> In particular, it would be nice to be able to post documents in such a way
> that you can guarantee that the document is permanently in Solr’s queue,
> safe in the event of a Solr restart, etc., even if the document has not yet
> been “committed”.
>
> This issue came up in the LCF talk that I gave, and I initially thought that
> separating the two kinds of events would necessarily be an LCF change, but
> the more I thought about it the more I realized that other Solr indexing
> clients may also benefit from such a separation.
>
> Does anyone agree?  Where should this logic properly live?
>
> Thanks,
> Karl
>
>
>
>

--

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin <at> acquia.com
Peter Wolanin | 24 May 15:31 2010

Re: Solr updateRequestHandler and performance vs. atomicity

We us an autocommit with Solr and I've had this worry too - apparently
if you get a hard crash Solr will roll back the not-yet-committed
docs.

I don't think it's happened more than once in a year, but still possible.

-Peter

On Mon, May 24, 2010 at 9:10 AM,  <karl.wright <at> nokia.com> wrote:
> Hi all,
>
> It seems to me that the “commit” logic in the Solr updateRequestHandler (or
> wherever the logic is actually located) conflates two different semantics.
> One semantic is what you need to do to make the index process perform well.
> The other semantic is guaranteed atomicity of document reception by Solr.
>
> In particular, it would be nice to be able to post documents in such a way
> that you can guarantee that the document is permanently in Solr’s queue,
> safe in the event of a Solr restart, etc., even if the document has not yet
> been “committed”.
>
> This issue came up in the LCF talk that I gave, and I initially thought that
> separating the two kinds of events would necessarily be an LCF change, but
> the more I thought about it the more I realized that other Solr indexing
> clients may also benefit from such a separation.
>
> Does anyone agree?  Where should this logic properly live?
>
> Thanks,
> Karl
>
>
>
>

--

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin <at> acquia.com

Gmane