KK | 1 Jun 2009 10:48
Picon

How to post date encoded in NCR(decimal) to lucene indexer?

Hi All,
I'm trying to index data to lucene index in unicode utf-8 format. All my
search queries are of the form \uxxxx and its working fine. But the problem
is in some cases, when the document[actually a webpage content] contains
Numeric Character Reference[decimal], these are getting indexed as such. For
example I've the following data[some telugu language data],

డాక్టర్

When I index this they get indexed as such and querying using \uxxxx
format doesnot give any result. so I want to know is there any way
where we can configure lucene to take
care of such things by itself, or I've to convert the same to \uxxxx
format[this is just replace &# with \u and replace the 4-dig number
with its hex equivalent]. This manual

method doesnot sound good to me. If there is any standard way to doing
the same, please someone let ke know. Thank you.

--KK.

One question?
Is it mandatory that the data to be indexed by lucene has to in \uxxxx
format for unicode utf-8 encoded data?
vanshi | 1 Jun 2009 17:39
Picon

Re: No hits while searching!


Thanks Erick, I was able to get this work...as you said ..Luke is a great
tool to look in to what gets stored as indexes though in my case I was
searching before the indexes were created so i was getting zero hits.

On side note, I'm running a strange output with prefix query...it only works
when i have 3 or more than 3 letters in the first name/last name. Any idea
what is going on here? Please see the output from log here.

02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms in
PhysicianQuerybuilder with exactName=true
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
First name: ang
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
Last name: john
02:05:21,012 INFO  [LuceneIndexService] the query is:
+(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
02:05:21,012 INFO  [LuceneIndexService] Result Size: 1

02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms in
PhysicianQuerybuilder with exactName=true
02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query, First
name: a
02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query, Last
name: johns
02:06:03,578 INFO  [LuceneIndexService] the query is: +()
+(LAST_NAME_EXACT:johns*)
02:06:03,578 INFO  [LuceneIndexService] Result Size: 0

02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms in
(Continue reading)

Sudarsan, Sithu D. | 1 Jun 2009 17:41
Picon

RE: No hits while searching!


Do you use stopword filtering?

Sincerely,
Sithu D Sudarsan

-----Original Message-----
From: vanshi [mailto:nilu.thakur <at> gmail.com] 
Sent: Monday, June 01, 2009 11:39 AM
To: java-user <at> lucene.apache.org
Subject: Re: No hits while searching!

Thanks Erick, I was able to get this work...as you said ..Luke is a
great
tool to look in to what gets stored as indexes though in my case I was
searching before the indexes were created so i was getting zero hits.

On side note, I'm running a strange output with prefix query...it only
works
when i have 3 or more than 3 letters in the first name/last name. Any
idea
what is going on here? Please see the output from log here.

02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
in
PhysicianQuerybuilder with exactName=true
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
First name: ang
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
Last name: john
(Continue reading)

Marco Lazzara | 1 Jun 2009 17:50
Picon

Re: Searching index problems with tomcat

In order to let you know about my problem I decided to use swt for the
standalone app and to use Google Web Toolkit for web app!!!!

:):)

bye
ML

2009/5/27 N Hira <nhira <at> cognocys.com>

>
> Cool!
>
> 1.  So you are creating a parser with { name, synonyms, propIn }, correct?
>
> 2.  Sorry -- I meant the output of "query.toString()"; I'm expecting to see
> something like this when the sentence parameter is set to philipcimiano:
>    name:philipcimiano synonyms:philipcimiano propIn:philipcimiano
>
> 3.  Out of curiosity, what is the value of topDocs.totalHits for your
> search?
>
> -h
>
>
>
> ----- Original Message ----
> From: Marco Lazzara <marco.lazzara <at> gmail.com>
> To: java-user <at> lucene.apache.org
> Sent: Wednesday, May 27, 2009 2:41:44 PM
(Continue reading)

Matthew Hall | 1 Jun 2009 17:58
Favicon

Re: No hits while searching!

Yeah, he's gotta be.

You might be better of using something like a lowercase analyzer here, 
since punctuation in a name is likely important.

Matt

Sudarsan, Sithu D. wrote:
>  
>
> Do you use stopword filtering?
>
> Sincerely,
> Sithu D Sudarsan
>
> -----Original Message-----
> From: vanshi [mailto:nilu.thakur <at> gmail.com] 
> Sent: Monday, June 01, 2009 11:39 AM
> To: java-user <at> lucene.apache.org
> Subject: Re: No hits while searching!
>
>
> Thanks Erick, I was able to get this work...as you said ..Luke is a
> great
> tool to look in to what gets stored as indexes though in my case I was
> searching before the indexes were created so i was getting zero hits.
>
> On side note, I'm running a strange output with prefix query...it only
> works
> when i have 3 or more than 3 letters in the first name/last name. Any
(Continue reading)

Tarandeep Singh | 1 Jun 2009 18:54
Picon

Distributed Lucene Questions

Hi All,

I am trying to build a distributed system to build and serve lucene indexes.
I came across the Distributed Lucene project-
http://wiki.apache.org/hadoop/DistributedLucene
https://issues.apache.org/jira/browse/HADOOP-3394

and have a couple of questions. It will be really helpful if someone can
provide some insights.

1) Is this code production ready?
2) Does someone has performance data for this project?
3) It allows searches and updates/deletes to be performed at the same time.
How well the system will perform if there are frequent updates to the
system. Will it handle the search and update load easily or will it be
better to rebuild or update the indexes on different machines and then
deploy the indexes back to the machines that are serving the indexes?

Basically I am trying to choose between the 2 approaches-

1) Use Hadoop to build and/or update Lucene indexes and then deploy them on
separate cluster that will take care or load balancing, fault tolerance etc.
There is a package in Hadoop contrib that does this, so I can use that code.

2) Use and/or modify the Distributed Lucene code.

I am expecting daily updates to our index so I am not sure if Distribtued
Lucene code (which allows searches and updates on the same indexes) will be
able to handle search and update load efficiently.

(Continue reading)

Jordon Saardchit | 1 Jun 2009 19:16
Favicon

Lucene on NFS/iSCSI


So I've read a lot about nightmares with lucene over shared indices using NFS, and was curious if anyone had
any experience running Lucene over iSCSI?  Specifically if the same sort of lock failure issues occur as
does with NFS.  I'm specifically looking into multple machines mounted to a SAN via iSCSI with accelerated
hardware initiator.

Looking to just get a general idea before investing in the possible solution.

Thanks,
Jordon
vanshi | 1 Jun 2009 19:31
Picon

Re: No hits while searching!


Thanks Matt & sithu. Yes, It was due to stop word analyzer...now i'm using a
simple analyzer temporarily, as I know even simple analyzer cannot handle
quotes in names. However, can somebody plz direct me towards how to handle
quotes with the name in query using lowercase analyzer?

thanks,
Vanshi

Matthew Hall-7 wrote:
> 
> Yeah, he's gotta be.
> 
> You might be better of using something like a lowercase analyzer here, 
> since punctuation in a name is likely important.
> 
> Matt
> 
> Sudarsan, Sithu D. wrote:
>>  
>>
>> Do you use stopword filtering?
>>
>> Sincerely,
>> Sithu D Sudarsan
>>
>> -----Original Message-----
>> From: vanshi [mailto:nilu.thakur <at> gmail.com] 
>> Sent: Monday, June 01, 2009 11:39 AM
>> To: java-user <at> lucene.apache.org
(Continue reading)

Matthew Hall | 1 Jun 2009 19:51
Favicon

Re: No hits while searching!

Just build your own.

Here's exactly what you are looking for:

(Mind you I just whipped this out, and didn't compile it... so there 
could be minor syntax errors here.)

You will also obviously have to make your own package declaration, and 
your own imports.

So anyhow, the really neat thing about lucene, is being able to do 
exactly what we just did here, you can chain these tokenizers and 
filters together in almost any way you want, and create custom analyzers 
outta them.

Its a good thing to become familiar with, because I will nearly promise 
you that this analyzer here will ALSO probably be insufficient for your 
needs.

Anyhow, hope this helps.

Matt

/**
 * Custom Lowercase Analyzer
 *
 *  <at> author mhall
 *
 * This analyzer tokenizes on whitespace, and then lowercases the token.
 *
(Continue reading)

Ken Krugler | 1 Jun 2009 20:05

Re: Distributed Lucene Questions

>Hi All,
>
>I am trying to build a distributed system to build and serve lucene indexes.
>I came across the Distributed Lucene project-
>http://wiki.apache.org/hadoop/DistributedLucene
>https://issues.apache.org/jira/browse/HADOOP-3394
>
>and have a couple of questions. It will be really helpful if someone can
>provide some insights.
>
>1) Is this code production ready?
>2) Does someone has performance data for this project?
>3) It allows searches and updates/deletes to be performed at the same time.
>How well the system will perform if there are frequent updates to the
>system. Will it handle the search and update load easily or will it be
>better to rebuild or update the indexes on different machines and then
>deploy the indexes back to the machines that are serving the indexes?
>
>Basically I am trying to choose between the 2 approaches-
>
>1) Use Hadoop to build and/or update Lucene indexes and then deploy them on
>separate cluster that will take care or load balancing, fault tolerance etc.
>There is a package in Hadoop contrib that does this, so I can use that code.
>
>2) Use and/or modify the Distributed Lucene code.
>
>I am expecting daily updates to our index so I am not sure if Distribtued
>Lucene code (which allows searches and updates on the same indexes) will be
>able to handle search and update load efficiently.

(Continue reading)


Gmane