Ganesh | 1 Feb 07:58 2010
Picon

Re: Roadmap for next release

Any reply to this thread?

----- Original Message ----- 
From: "Ganesh" <emailgane <at> yahoo.co.in>
To: <java-user <at> lucene.apache.org>
Sent: Thursday, January 28, 2010 2:35 PM
Subject: Roadmap for next release

Hello all,

Please provide me the information related to road map for the next release. This information will be really
helpful to plan our product road map for this year. 

Is the below feature planned for this year.
-----------------------------------------
1. To reduce sorting memory consumption by caching / offload it to disk
2. If all records are not part of sorting, Is there any way to create the custom field cache array based on some
filter criteria.

Regards
Ganesh  
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
For additional commands, e-mail: java-user-help <at> lucene.apache.org

Send instant messages to your online friends http://in.messenger.yahoo.com 
Dennis Hendriksen | 1 Feb 08:08 2010

RE: combine query score with external score

Hi Steve,

Thank you for your suggestions. Payloads might indeed help me to
overcome the precision loss problem that I am experiencing right now. I
don't think it will help me with the combining of Lucene scores with
external scores however.

Is there anyone who has a suggestion how to deal with that?

Dennis 

On Thu, 2010-01-28 at 13:52 -0500, Steven A Rowe wrote:
> Hi Dennis,
> 
> You should check out payloads (arbitrary per-index-term byte[] arrays), which can be used to encode
values which are then incorporated into documents' scores, by overriding Similarity.scorePayload():
> 
> <http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html#scorePayload%28int,%20java.lang.String,%20int,%20int,%20byte[],%20int,%20int%29>
> 
> The Lucene in Action 2 MEAP has a nice introduction to using payloads to influence scoring, in section 6.5.
> 
> See also this (slightly out-of-date*) blog post "Getting Started with Payloads" by Grant Ingersoll at
Lucid Imagination:
> 
> <http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/>
> 
> *Note that since this blog post was written, BoostingTermQuery was renamed to PayloadTermQuery (in
Lucene 2.9.0+ ; see http://issues.apache.org/jira/browse/LUCENE-1827 ; wow - this issue isn't
mentioned in CHANGES.txt???):
> 
(Continue reading)

jchang | 1 Feb 08:25 2010
Picon

Can't get tokenization/stop works working


I want to be able to store a doc with a field with this as a substring:
  www.fubar.com
And then I want this document to get returned when I query on
  fubar or
  fubar.com

I assume what I should do is make www and com stop words, and make sure the
field is tokenized, so it wil break it up along the '.'

I thought  I should take a list of Enlisgh stop words, add in 'www' and com,
and then make sure the field is tokenized, which I did by using this
constructor:
new Field("name", "value",  Field.Store.YES, Field.Index.Analyzed).
I saw that Field.Index.Analyzed meant it would be tokenized.

It is not working.  Searching on fubar or fubar.com does not return it. 
Thanks for any help.
--

-- 
View this message in context: http://old.nabble.com/Can%27t-get-tokenization-stop-works-working-tp27400546p27400546.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Uwe Schindler | 1 Feb 08:33 2010
Picon

Apache Hadoop Get Together Berlin March 2010

Hello,

this is to announce the next Apache Hadoop Get Together Berlin:

                      When: March 10th, 5p.m. 
                      Where: Newthinking store Berlin 

Talks scheduled so far:

   * Bram Smeets (JTeam/ Amsterdam): Spatial Search.
   * Dragan Milosevic (zanox/ Berlin: Product Search and Reporting powered by 
     Hadoop.
   * Bob Schulze (eCircle/ Munich): Database and Table Design Tips with HBase.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big
thanks goes to Nokia Gate 5 for sponsoring videos of the talks. Links to the videos will be posted after the event.

More information as well as registration is available on upcoming or xing:

 http://upcoming.yahoo.com/event/5280014
 https://www.xing.com/events/apache-hadoop-march-2010-459305

Looking forward to seeing you in Berlin,
Isabel Drost
Ian Lea | 1 Feb 10:19 2010
Picon

Re: best way to compare Documents

> ...
> Is there some convenient way to compare Lucene Documents?

Not that I know of.

> I want to check if I should update a document based on whether field values have changed and whether fields
have been added or removed.
>
> Is it as simple as:
>
> newDoc.equals(oldDoc)

No!

> I don't need to create the newDoc first, so I could compare by field. I am creating Fields like so:
>
> new Field("modified", modified, Field.Store.YES, Field.Index.NOT_ANALYZED)
>
> So, would it be better to:
> * check oldDoc's fields count against the to be created documents desired fields count, and
> * loop through the fields and compare values

Yes.  Of course you won't be able to compare unstored fields.

> Is there a better way to create Fields and/or Documents for this type of thing?

Hash or CRC as Shashi suggested, but you'll still need compare old and
new hash values.

--
(Continue reading)

Ian Lea | 1 Feb 10:27 2010
Picon

Re: Can't get tokenization/stop works working

If you make com a stop word then you won't be able to search for it,
but a search for fubar should have worked.  Are you sure your analyzer
is doing what you want?  You don't tell us what analyzer you are
using.

Tips:
  use Luke to see what has been indexed
  read the FAQ entry
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

--
Ian.

On Mon, Feb 1, 2010 at 7:25 AM, jchang <jchangkihatest <at> gmail.com> wrote:
>
> I want to be able to store a doc with a field with this as a substring:
>  www.fubar.com
> And then I want this document to get returned when I query on
>  fubar or
>  fubar.com
>
> I assume what I should do is make www and com stop words, and make sure the
> field is tokenized, so it wil break it up along the '.'
>
> I thought  I should take a list of Enlisgh stop words, add in 'www' and com,
> and then make sure the field is tokenized, which I did by using this
> constructor:
> new Field("name", "value",  Field.Store.YES, Field.Index.Analyzed).
> I saw that Field.Index.Analyzed meant it would be tokenized.
>
(Continue reading)

Ian Lea | 1 Feb 10:42 2010
Picon

Re: combine query score with external score

Have you considered the function query stuff?
oal.search.function.CustomScoreQuery and friends.
If you provide your own CustomScoreQuery implementation you can do
scoring however you like.

--
Ian.

On Mon, Feb 1, 2010 at 7:08 AM, Dennis Hendriksen
<dennis.hendriksen <at> kalooga.com> wrote:
> Hi Steve,
>
> Thank you for your suggestions. Payloads might indeed help me to
> overcome the precision loss problem that I am experiencing right now. I
> don't think it will help me with the combining of Lucene scores with
> external scores however.
>
> Is there anyone who has a suggestion how to deal with that?
>
> Dennis
>
> On Thu, 2010-01-28 at 13:52 -0500, Steven A Rowe wrote:
>> Hi Dennis,
>>
>> You should check out payloads (arbitrary per-index-term byte[] arrays), which can be used to encode
values which are then incorporated into documents' scores, by overriding Similarity.scorePayload():
>>
>> <http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html#scorePayload%28int,%20java.lang.String,%20int,%20int,%20byte[],%20int,%20int%29>
>>
>> The Lucene in Action 2 MEAP has a nice introduction to using payloads to influence scoring, in section 6.5.
(Continue reading)

Suraj Parida | 1 Feb 12:43 2010
Picon

Searching compressed text using CompressionTools


Hi,

I want to compress a text field (due to its large size and spaces), during
indexing.

I am unable to get the same also want to search. 

My code during compressing is as follows:
                                String value = "Some large text ...... ";
				byte[] valuesbyte = CompressionTools.compress(value.getBytes());
				final Field f = new Field(key, valuesbyte, Field.Store.YES);
				f.setOmitTermFreqAndPositions(true);
				f.setOmitNorms(true);
                                document.add(f);

Please tell me how to search and display this value.

Regards
Suraj
--

-- 
View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27402945.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Uwe Schindler | 1 Feb 12:46 2010
Picon

RE: Searching compressed text using CompressionTools

Compression is only used for *stored* fields. For indexing there is no compression available (how should
that work). You must clearly differentiate between stored and indexed fields!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe <at> thetaphi.de

> -----Original Message-----
> From: Suraj Parida [mailto:parida.suraj <at> gmail.com]
> Sent: Monday, February 01, 2010 12:44 PM
> To: java-user <at> lucene.apache.org
> Subject: Searching compressed text using CompressionTools
> 
> 
> Hi,
> 
> I want to compress a text field (due to its large size and spaces),
> during
> indexing.
> 
> I am unable to get the same also want to search.
> 
> 
> My code during compressing is as follows:
>                                 String value = "Some large text ......
> ";
> 				byte[] valuesbyte =
> CompressionTools.compress(value.getBytes());
(Continue reading)

Uwe Schindler | 1 Feb 12:50 2010
Picon

RE: Searching compressed text using CompressionTools

I forget:

To also index those fields, add it a second time with only index enabled and same name:

String value = "Some large text ...... ";
byte[] valuesbyte = CompressionTools.compress(value.getBytes());
Field f = new Field(key, valuesbyte, Field.Store.YES);
Document.add(field); // the stored one, so need for norms/TF suppress
F = new Field(key, value, Field.Store.NO, Field.Index.ANALYZED);
f.setOmitTermFreqAndPositions(true);
f.setOmitNorms(true);
document.add(f);

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe <at> thetaphi.de

> -----Original Message-----
> From: Uwe Schindler [mailto:uwe <at> thetaphi.de]
> Sent: Monday, February 01, 2010 12:46 PM
> To: java-user <at> lucene.apache.org
> Subject: RE: Searching compressed text using CompressionTools
> 
> Compression is only used for *stored* fields. For indexing there is no
> compression available (how should that work). You must clearly
> differentiate between stored and indexed fields!
> 
> -----
(Continue reading)


Gmane