Ganesh | 1 Dec 2008 05:34
Picon
Favicon

Re: Which is faster/better

I have to Tag a document based on User request. On deletion, I should do 
'marked for delete' and on document state change, i need to update the 
document.
Update internally does delete and add. I am commiting the writer and 
re-opening the reader, every minute.

Consider, In a minute, lets say User has deleted a document from the UI.
If i use IndexWriter, then it is updating the document. but it is getting 
refreshed only after a minute. If User refreshes his page, then he could see 
the deleted item again.

In order to avoid this situitation, i need to plan
 1. Delete the document using reader
 2. Add the document with new state using Writer.

I think, we can't avoid using DeleteDocument of Reader. Suggest me, if there 
is any other way.

Regards
Ganesh

----- Original Message ----- 
From: "Antony Bowesman" <adb <at> teamware.com>
To: <java-user <at> lucene.apache.org>
Sent: Wednesday, November 26, 2008 4:00 AM
Subject: Re: Which is faster/better

> Michael McCandless wrote:
>>
>> If you have nothing open already, and all you want to do is delete
(Continue reading)

Ganesh | 1 Dec 2008 09:28
Picon
Favicon

Re: Marked for deletion

I need to index voluminous data and i plan to shard it. The client may not 
know which shard db to query. Server will take care of complete shard 
management. I have done almost 50% of  development with Lucene.

In case of Solr, i think the client should be aware of which core or 
instance it want to communicate?

Regards
Ganesh

----- Original Message ----- 
From: "Erik Hatcher" <erik <at> ehatchersolutions.com>
To: <java-user <at> lucene.apache.org>
Sent: Tuesday, November 25, 2008 4:45 PM
Subject: Re: Marked for deletion

>
> On Nov 25, 2008, at 5:00 AM, Ganesh wrote:
>> My index application is a separate process and my search application  is 
>> part of web ui. When User performs delete, i want to do mark for 
>> deletion.
>>
>> I think i have no other option other than to update the document,  but 
>> index app is a separate process and it uses index writer. In  order to 
>> update, I am planning to use RMI and create a single  application which 
>> does both index and search and also exposes some  search and delete 
>> methods.
>>
>> Is there any other way to achieve this?
>
(Continue reading)

Erik Hatcher | 1 Dec 2008 09:43
Favicon

Re: Marked for deletion


On Dec 1, 2008, at 3:28 AM, Ganesh wrote:
> I need to index voluminous data and i plan to shard it. The client  
> may not know which shard db to query. Server will take care of  
> complete shard management. I have done almost 50% of  development  
> with Lucene.
>
> In case of Solr, i think the client should be aware of which core or  
> instance it want to communicate?

See <http://wiki.apache.org/solr/DistributedSearch>

The example shows a shards parameter being sent from a client, yes....  
but all Solr parameters can be either specified from the client or set  
in server-side configuration. So no, a client doesn't need to be aware  
of which shards to query.

	Erik
Niels Ott | 1 Dec 2008 11:14
Picon
Favicon
Gravatar

Re: Deleting from Index by URL field: is it safe?

Hi all,

German Kondolf schrieb:
> It works exactly as it does when you search of that term.
> 
> Review in your index creation, if you store it without analyzing it
> (Index.UN_TOKENIZED), it will only match that document when you have an
> exact URL.

Is that also true if I simply use the KeywordAnalyzer?

The reason why I want to do it this way is that I have a special 
Analyzer that encapsulates the "knowledge" on how to treat each field. 
In a way something like the PerFieldAnalyzerWrapper but more 
specialized. I want to use the very same Analyzer for querying as well, 
so it appears to me that it is good to have the "knowledge" about the 
treatment of fields in that single place.

> It's possible that the URL is not unique enought in your domain, there is no
> other unique identifier that you could use?

I think the URL is unique enough for my cases. The system is still a 
prototype so I can change that later, if it turns out that it doesn't do 
the job for me.

> I suggest you create a test and try it on a RAMDirectory and see exactly
> what happens and what you want!

This looks like a good idea to me. Thank you for the hint.

(Continue reading)

Michael McCandless | 1 Dec 2008 11:56

API changes in 2.4


Heads up: there are two API changes in 2.4 that might bite you on
upgrading:

   * If you are subclassing QueryParser and override addClause or
     getBooleanQuery, you need to change the argument type from Vector
     to List, else your method won't be called.  This was caused by
     LUCENE-1369, where I accidentally broke back compatibility
     (sorry!!).

   * If you use Document.getFieldables, getFields, getValues or
     getBinaryValues, these methods now return an empty list instead of
     null if there are 0 things to return.  If you have code that
     checks != null you need to change it to .size() != 0.  This was
     from LUCENE-1233

Mike
tiziano bernardi | 1 Dec 2008 12:28
Picon
Favicon

Pdf in Lucene?


Hi,
I want to index PDF files with lucene is possible? 
What like?
Thanks Tiziano Bernardi
_________________________________________________________________
Fanne di tutti i colori, personalizza la tua Hotmail!
http://imagine-windowslive.com/Hotmail/#0
tiziano bernardi | 1 Dec 2008 12:37
Picon
Favicon

Pdf in Lucene?


Hi,I want to index PDF files with lucene is possible? What like?Thanks Tiziano Bernardi
_________________________________________________________________
50 nuovi schemi per giocare su CrossWire! Accetta la sfida!
http://livesearch.games.msn.com/crosswire/play_it/
Ian Vink | 1 Dec 2008 12:38
Picon

Hits Max # of documents?

(I'm using Lucene.NET but the APIs are close enough)
I'd like the search to always return all documents always. I notice that it
'seems'  to return a percentage of them.

Hits myHits = searcher.search(query);

Is what I use.

Is there a way to force the searcher to give me everything?

Ian
Ian Lea | 1 Dec 2008 12:43
Picon

Re: Pdf in Lucene?

Hi

Lucene only indexes text so you'll have to get the text out of the PDF
and feed it to lucene.

Google for lucene pdf, or go straight to http://www.pdfbox.org/

--
Ian.

2008/12/1 tiziano bernardi <dk1982 <at> hotmail.it>:
>
>
> Hi,
> I want to index PDF files with lucene is possible?
> What like?
> Thanks Tiziano Bernardi
> _________________________________________________________________
> Fanne di tutti i colori, personalizza la tua Hotmail!
> http://imagine-windowslive.com/Hotmail/#0
Ian Lea | 1 Dec 2008 12:45
Picon

Re: Hits Max # of documents?

From the (java) apidocs: Class MatchAllDocsQuery ... A query that
matches all documents.

Sounds like it should do the trick.

--
Ian.

On Mon, Dec 1, 2008 at 11:38 AM, Ian Vink <ianvink <at> gmail.com> wrote:
> (I'm using Lucene.NET but the APIs are close enough)
> I'd like the search to always return all documents always. I notice that it
> 'seems'  to return a percentage of them.
>
> Hits myHits = searcher.search(query);
>
> Is what I use.
>
> Is there a way to force the searcher to give me everything?
>
> Ian
>

Gmane