Nilesh Bansal | 1 Apr 01:09 2007
Picon

Lock files in a read-only application

Hi all,

We have a web-based application that searches a large lucene index.
This application only creates object of type IndexSearcher only (and
no IndexWriters) for searching the index. After the application runs
for some time (a few hours), I can see lock files in the temp
directory of the form
/opt/tomcat/temp/lucene-5f77ffdc821b3f8e861949e9ecc35a53-commit.lock
The temp dir is set to /opt/tomcat/temp/ as we are using tomcat.

Since the application is read-only, there is no point in it using the
lock files. These lock files are creating a lot of trouble for me, as
their presence leads to a lock obtained timeout in other threads. It
seems like a bug in lucene.

The index is updated be a separate process independent of the
web-application once in a while. Currently I am using an independent
shell script that checks for the temp dir for lock files every few
seconds and deletes the lock files if any (to prevent a lock obtained
timeout in other threads).

Any help will be appreciated.

thanks
Nilesh Bansal.
http://queens.db.toronto.edu/~nilesh/
Michael McCandless | 1 Apr 03:03 2007

Re: Lock files in a read-only application

"Nilesh Bansal" <nileshbansal <at> gmail.com> wrote:
> We have a web-based application that searches a large lucene index.
> This application only creates object of type IndexSearcher only (and
> no IndexWriters) for searching the index. After the application runs
> for some time (a few hours), I can see lock files in the temp
> directory of the form
> /opt/tomcat/temp/lucene-5f77ffdc821b3f8e861949e9ecc35a53-commit.lock
> The temp dir is set to /opt/tomcat/temp/ as we are using tomcat.
> 
> Since the application is read-only, there is no point in it using the
> lock files. These lock files are creating a lot of trouble for me, as
> their presence leads to a lock obtained timeout in other threads. It
> seems like a bug in lucene.
> 
> The index is updated be a separate process independent of the
> web-application once in a while. Currently I am using an independent
> shell script that checks for the temp dir for lock files every few
> seconds and deletes the lock files if any (to prevent a lock obtained
> timeout in other threads).

The commit lock has been problematic in the past and has been removed
entirely (so readers are indeed read only) as of release 2.1.

Before 2.1, it is very necessary: a reader acquires this lock (creates
the file) when it's reading the segments file, to prevent a writer
from overwriting the file at the same time.  Likewise a writer
acquires this lock before overwriting the segments file.  However it
should exist only briefly, so the fact that you see the lockfile
staying around means something bad is happening.

(Continue reading)

Nilesh Bansal | 1 Apr 03:30 2007
Picon

Re: Lock files in a read-only application

Thanks for your response. Is there a way that I can disable these read
locks without upgrading to 2.1. Our application uses its own custom
locking mechanism, so that lucene locking is actually redundant. We
are currently using Lucene version 2.0.

The application has multiple threads (different web requests) reading
the same index simultaneously (say 20 concurrent threads). Can that be
a reason of this problem. Sometimes the lockfiles remain there for
long periods of time (more than a few minutes, which is bad).

Yes, JVM sometimes crashes when it runs out of memory. There should be
someway that the lock files are removed after such crash (any fixes is
2.1?).

thanks
Nilesh

On 3/31/07, Michael McCandless <lucene <at> mikemccandless.com> wrote:
> "Nilesh Bansal" <nileshbansal <at> gmail.com> wrote:
> > We have a web-based application that searches a large lucene index.
> > This application only creates object of type IndexSearcher only (and
> > no IndexWriters) for searching the index. After the application runs
> > for some time (a few hours), I can see lock files in the temp
> > directory of the form
> > /opt/tomcat/temp/lucene-5f77ffdc821b3f8e861949e9ecc35a53-commit.lock
> > The temp dir is set to /opt/tomcat/temp/ as we are using tomcat.
> >
> > Since the application is read-only, there is no point in it using the
> > lock files. These lock files are creating a lot of trouble for me, as
> > their presence leads to a lock obtained timeout in other threads. It
(Continue reading)

Xiaocheng Luan | 1 Apr 03:43 2007
Picon

Re: setBoost on Field

what fields did you search, the headlines field only?

DECAFFMEYER MATHIEU <MATHIEU.DECAFFMAYER <at> fortis.lu> wrote:     setBoost on Field     Hi, 
  I am parsing this file called Logistics.htm  
I have a field named "headlines" that contains word "clients" among others. 
  When I don't put a boost on this field, I have as score 0.06 when searching for clients.  
Then when I put a boost of "10", I have a score of 0.21  
Yet I was expecting a score of 0.60 
  Could anyone clarify this behaviour to me ?  
Thank u for any help. 
  __________________________________ 
     Matt 

 
 ============================================
 Internet communications are not secure and therefore Fortis Banque Luxembourg S.A. does not accept legal
responsibility for the contents of this message. The information contained in this e-mail is
confidential and may be legally privileged. It is intended solely for the addressee. If you are not the
intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in
reliance on it, is prohibited and may be unlawful. Nothing in the message is capable or intended to create
any legally binding obligations on either party and it is not intended to provide legal advice.
 ============================================
  ---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
For additional commands, e-mail: java-user-help <at> lucene.apache.org

 
---------------------------------
 Get your own web address.
 Have a HUGE year through Yahoo! Small Business.
(Continue reading)

Chris Hostetter | 1 Apr 03:51 2007

Re: Lock files in a read-only application


: locks without upgrading to 2.1. Our application uses its own custom
: locking mechanism, so that lucene locking is actually redundant. We
: are currently using Lucene version 2.0.

since before the 2.0.0 release there has been a static
FSDirectory.setDisableLocks that can be called before opening any indexes
to prevent locking -- it's only intended to be used on indexes on read
only disk -- which is not the case in your situation, since a seperate
process is in fact modifying the index, but if you are confident in your
own locking mechanism you can use it.

: The application has multiple threads (different web requests) reading
: the same index simultaneously (say 20 concurrent threads). Can that be
: a reason of this problem. Sometimes the lockfiles remain there for
: long periods of time (more than a few minutes, which is bad).

mutliple reader threads should not cause the commit lock to stay arround
that long, even if each thread is opening it's on IndexReader (which they
should not do, it's better to open one and reuse it among many threads)

: Yes, JVM sometimes crashes when it runs out of memory. There should be
: someway that the lock files are removed after such crash (any fixes is
: 2.1?).

As Michael said, in 2.1 the commit lock doesn't even exist, and in general
there is a much more robust lock management system that lets you decide
what type of lock mechanism to use.

in 2.0.0 your only option for dealing with stale locks is to forcebly
(Continue reading)

ashwin kumar | 1 Apr 06:15 2007
Picon

BEST GUI Fron tend for lucene

hi all

this is ashwin i have done my indexing program and i am able to retrieve the
matching documents in command line .now i want to create a GUI and want to
display the matching documents in the gui. can anyone suggest me what would
be the best gui for lucene when it comes to networked environment.

thnks
regards
ashwin
rkwagle | 1 Apr 07:30 2007
Picon

Re: BEST GUI Fron tend for lucene


ashwin kumar-3 wrote:
> 
> hi all
> 
> this is ashwin i have done my indexing program and i am able to retrieve
> the
> matching documents in command line .now i want to create a GUI and want to
> display the matching documents in the gui. can anyone suggest me what
> would
> be the best gui for lucene when it comes to networked environment.
> 
> thnks
> regards
> ashwin
> 
> 

Okay i can suggest you an GUI application i am currently into. okay as you
say in networked environment:
maintain a file server with the files to be indexed. maintain a index server
where the indexing application runs and where the actual indexing of the
files occur. maintain the index location in a separate server or on the same
machine.
Now design the front end that is run on the client terminal which performs
the work of giving the index location of the files to be indexed and
performing the indexing.
Then add features like searching to the client application where the search
results are displayed in which the actual file contents are displayed after
being searched in the index location.
(Continue reading)

Mohsen Saboorian | 1 Apr 08:30 2007
Picon

Emulating Pages Search


Hi,
Is there a way to do emulate paged search in Lucene? I can use the following
peace of code for returning the first page (10 items in each page), but
don't know how to navigate to the next page :-)

	IndexSearcher is = new ...
	...
	TopFieldDocs tops = is.search(query, null /*filter*/, 10, Sort.RELEVANCE);
	for (int i = 0; i < tops.scoreDocs.length; i++) {
		ScoreDoc scoreDoc = tops.scoreDocs[i];
		System.out.println(is.doc(scoreDoc.doc));
	}

I can see that tops.totalHits, returns all matched documents. So is this
really "paged search", or I'm just doing a complete search and put a window
on the returned result each time?

Thanks.
--

-- 
View this message in context: http://www.nabble.com/Emulating-Pages-Search-tf3500169.html#a9775141
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Chris Hostetter | 1 Apr 09:58 2007

Re: Emulating Pages Search


: I can see that tops.totalHits, returns all matched documents. So is this
: really "paged search", or I'm just doing a complete search and put a window
: on the returned result each time?

that's all paginated searching is ... you have to know the first 10
results to show 1-10, but you also have to know the first 20 results in
order to be able to show 11-20.

-Hoss
Michael McCandless | 1 Apr 10:38 2007

Re: Lock files in a read-only application

"Chris Hostetter" <hossman_lucene <at> fucit.org> wrote:
>
> : locks without upgrading to 2.1. Our application uses its own custom
> : locking mechanism, so that lucene locking is actually redundant. We
> : are currently using Lucene version 2.0.
> 
> since before the 2.0.0 release there has been a static
> FSDirectory.setDisableLocks that can be called before opening any indexes
> to prevent locking -- it's only intended to be used on indexes on read
> only disk -- which is not the case in your situation, since a seperate
> process is in fact modifying the index, but if you are confident in your
> own locking mechanism you can use it.

You need to be really certain your own locking protects Lucene
properly.  Specifically, no IndexReader can be created (restarted)
while a writer is open against the index, and, only one writer can be
open on the index at once (it sounds like you already have that).  If
you're sure about that then disabling the locks as Hoss describes
above is OK.

> : The application has multiple threads (different web requests) reading
> : the same index simultaneously (say 20 concurrent threads). Can that be
> : a reason of this problem. Sometimes the lockfiles remain there for
> : long periods of time (more than a few minutes, which is bad).
> 
> mutliple reader threads should not cause the commit lock to stay arround
> that long, even if each thread is opening it's on IndexReader (which they
> should not do, it's better to open one and reuse it among many threads)

This part (commit lock staying around for so long) is definitely odd
(Continue reading)


Gmane