Harpreet S Walia | 1 Nov 2002 07:31

Re: Query Boosting

Thanks a lot Dave and Ype .

Dave , I tried the setBoost you had mantioned. what i found is that i have
to set the boost of the other fields to 1 in order to get proper results . i
am not sure why , but its working  :-).

Ype, the other option that u suggested is great . i guess this is a more
proper way of doing it . i also found some references to the same in the
mail archives.

Thanks once again.

Regards,
Harpreet

----- Original Message -----
From: "David Birtwell" <David_Birtwell <at> vwr.com>
To: "Lucene Users List" <lucene-user <at> jakarta.apache.org>
Sent: Thursday, October 31, 2002 11:41 PM
Subject: Re: Query Boosting

> Harpreet,
>
> This looks like a fine solution to me.  As an alternative, you could use
> Field boosting at index time.  Before you add the security credential
> field to the lucene Document object, you could call field.setBoost(0.1f).
>
> FYI, field boosting is not part of the release.  You can get it from the
> public CVS.
>
(Continue reading)

Rob Outar | 1 Nov 2002 15:05

Working with a Distributed System

All,

	I have what I think is an interesting problem.  I am working on a
distributed system where all repositories on each node have to be kept in
sync.  I am using Lucene on each node to index the data.  Users are allowed
to associate Fields with files, set values of existing fields, these fields
then have be also be associated with the same document on other nodes.  I am
using broadcast events to update the other nodes.  The problem is when a new
node joins in, I am not sure how to get the changes to the various indexes
to that node.  All nodes that are running together should be in sync, but
when a new node joins it does not know about any of the changes.  The basic
problem is how do I keep the indexes the same on all of the nodes.  I though
about maybe setting up a CVS Server and storing the index in it then when a
new node joins it checks out the index but I do not know enough about the
internal of Lucene to know if that will work, I will be constantly
committing files because the index will get updated a lot on the various
nodes, also will node b's committed files overwrite node a's files which
means nodes a changes to the index will be lost... very difficult problem,
if anyone has any thoughts on this subject I would love to hear them.

Thanks,

Rob
Brian Cuttler | 1 Nov 2002 16:46
Favicon

Modify demo jar files, build question


Hello.

I've installed ant 1.5.1 on Solaris 8 so that I could modify
the demo jar files that come with Lucene 1.2.

I've made a small modification to IndexHTML.java which is
under /lucene-1.2-src/src/demo/org/apache/lucene/demo, I've
modified the code to exclude the indexing of the .txt files.

I believe I have to compile the java file in order to create
a class file in order to replace IndexHTML in the demo.jar file.

If there is already code I can use that will index only the
htm and html files I'd be happy to use it...

Is that right ? Can I do that directly (it produced errors and
I don't know quite enough about java, ok, I don't know anything
about java).

Obviously this didn't work so looking at the read.me I tried
a second attempt using ANT.

# javac IndexHTML.java
IndexHTML.java:57: Class org.apache.lucene.analysis.standard.StandardAnalyzer not found in import.
import org.apache.lucene.analysis.standard.StandardAnalyzer;
       ^
IndexHTML.java:58: Package org.apache.lucene.index not found in import.
import org.apache.lucene.index.*;
       ^
(Continue reading)

Ype Kingma | 1 Nov 2002 20:25
Picon
Picon
Favicon

Re: Working with a Distributed System

On Friday 01 November 2002 15:05, Rob Outar wrote:
> All,
>
> 	I have what I think is an interesting problem.  I am working on a
> distributed system where all repositories on each node have to be kept in
> sync.  I am using Lucene on each node to index the data.  Users are allowed
> to associate Fields with files, set values of existing fields, these fields
> then have be also be associated with the same document on other nodes.  I
> am using broadcast events to update the other nodes.  The problem is when a
> new node joins in, I am not sure how to get the changes to the various
> indexes to that node.  All nodes that are running together should be in
> sync, but when a new node joins it does not know about any of the changes. 
> The basic problem is how do I keep the indexes the same on all of the
> nodes.  I though about maybe setting up a CVS Server and storing the index
> in it then when a new node joins it checks out the index but I do not know
> enough about the internal of Lucene to know if that will work, I will be
> constantly committing files because the index will get updated a lot on the
> various nodes, also will node b's committed files overwrite node a's files
> which means nodes a changes to the index will be lost... very difficult
> problem, if anyone has any thoughts on this subject I would love to hear
> them.

Assuming you run Unix, you might try and use rsync.
It works like cp (copy) but it takes into account what is already on the 
destination.
See http://rsync.samba.org/

I'd like to hear how it works for lucene indexes...

Kind regards,
(Continue reading)

Otis Gospodnetic | 2 Nov 2002 00:56
Picon
Favicon

Re: Working with a Distributed System

That is the approach I took at my previous job, which involved some
Lucene work.  I used sdist, to securely distribute the whole index (the
whole dir with index files) to a number of remote machines.

This may not work well if indices need to constantly be in sync, and if
the index can be modified on all index nodes.

How about using JMS and publish/subscribe with maybe time-stamped
messages, etc.?

Otis

--- Ype Kingma <ykingma <at> xs4all.nl> wrote:
> On Friday 01 November 2002 15:05, Rob Outar wrote:
> > All,
> >
> > 	I have what I think is an interesting problem.  I am working on a
> > distributed system where all repositories on each node have to be
> kept in
> > sync.  I am using Lucene on each node to index the data.  Users are
> allowed
> > to associate Fields with files, set values of existing fields,
> these fields
> > then have be also be associated with the same document on other
> nodes.  I
> > am using broadcast events to update the other nodes.  The problem
> is when a
> > new node joins in, I am not sure how to get the changes to the
> various
> > indexes to that node.  All nodes that are running together should
(Continue reading)

Otis Gospodnetic | 2 Nov 2002 00:59
Picon
Favicon

Re: Modify demo jar files, build question

Yes, you need to set your CLASSPATH properly.  I won't go into that
here, but any intro to Java book or web page will explain this.
This is nothing specific to Lucene.

Otis

--- Brian Cuttler <brian <at> wadsworth.org> wrote:
> 
> Hello.
> 
> I've installed ant 1.5.1 on Solaris 8 so that I could modify
> the demo jar files that come with Lucene 1.2.
> 
> I've made a small modification to IndexHTML.java which is
> under /lucene-1.2-src/src/demo/org/apache/lucene/demo, I've
> modified the code to exclude the indexing of the .txt files.
> 
> I believe I have to compile the java file in order to create
> a class file in order to replace IndexHTML in the demo.jar file.
> 
> If there is already code I can use that will index only the
> htm and html files I'd be happy to use it...
> 
> Is that right ? Can I do that directly (it produced errors and
> I don't know quite enough about java, ok, I don't know anything
> about java).
> 
> Obviously this didn't work so looking at the read.me I tried
> a second attempt using ANT.
> 
(Continue reading)

Harpreet S Walia | 2 Nov 2002 05:09

Re: Query Boosting

Hi ,

I am expecting my index to go into millions(5-10) of documnets very soon ,
so would using a limiting filter pose any problems for such kind of a index.

Thanks And Regards,
Harpreet

----- Original Message -----
From: "Ype Kingma" <ykingma <at> xs4all.nl>
To: "Lucene Users List" <lucene-user <at> jakarta.apache.org>;
<harpreet <at> sansuisoftware.com>
Sent: Friday, November 01, 2002 12:05 AM
Subject: Re: Query Boosting

> On Thursday 31 October 2002 18:45, harpreet <at> sansuisoftware.com wrote:
> > Hi,
> >
> > My application requires a facility to have security build into the
> > documents so that when i search for a given word depending on the
security
> > credentials stored in a field in the document the results are filtered .
> >
> > Now the problem i am facing is that the score of such results includes
> > these security credentials in the query in addition to the query entered
by
> > the user. So the relevancy according to the actual search word entered
by
> > the use is affected .
> >
(Continue reading)

Paul | 2 Nov 2002 05:48
Picon

Re: Working with a Distributed System

My initial reaction to the first post was to use rsync too. I was about
to post that, when I read Ype's post. ;-)

Another option is to do what we're doing, and write a daemon
which talks to Lucene on the server it runs on, and also serves
requests coming in on a specific port. That way many clients
can have the benefit of one index.

You are welcome to our source, once we've got it to a stage
where we can wrap it all up nicely and Open Source it. As
it stands it is currently working well in a beta form.

Cheers,
Paul.

Otis Gospodnetic wrote:
> That is the approach I took at my previous job, which involved some
> Lucene work.  I used sdist, to securely distribute the whole index (the
> whole dir with index files) to a number of remote machines.
>
> This may not work well if indices need to constantly be in sync, and if
> the index can be modified on all index nodes.
>
> How about using JMS and publish/subscribe with maybe time-stamped
> messages, etc.?
>
> Otis
>
> --- Ype Kingma <ykingma <at> xs4all.nl> wrote:
> > On Friday 01 November 2002 15:05, Rob Outar wrote:
(Continue reading)

Otis Gospodnetic | 2 Nov 2002 06:26
Picon
Favicon

Re: Working with a Distributed System

That sounds like a potentially nice piece of software for Lucene
Sandbox contributions area.  Thanks.

Otis

--- Paul <paul <at> waite.net.nz> wrote:
> My initial reaction to the first post was to use rsync too. I was
> about
> to post that, when I read Ype's post. ;-)
> 
> Another option is to do what we're doing, and write a daemon
> which talks to Lucene on the server it runs on, and also serves
> requests coming in on a specific port. That way many clients
> can have the benefit of one index.
> 
> You are welcome to our source, once we've got it to a stage
> where we can wrap it all up nicely and Open Source it. As
> it stands it is currently working well in a beta form.
> 
> Cheers,
> Paul.
> 
> 
> Otis Gospodnetic wrote:
> > That is the approach I took at my previous job, which involved some
> > Lucene work.  I used sdist, to securely distribute the whole index
> (the
> > whole dir with index files) to a number of remote machines.
> >
> > This may not work well if indices need to constantly be in sync,
(Continue reading)

Harpreet S Walia | 2 Nov 2002 08:12

Score calculation


Can Anyone please tell me where does the score calulation take place in the lucene api.

TIA,

Regards ,
Harpreet

Gmane