Erick Erickson | 1 Aug 2010 03:21
Picon

Re: Get fields from a Query object

Could you explain more about what you're trying to do? You're writing the
query
after all, so you probably already know what went into it.

Which shows that I don't understand what you want to do at all.

Best
Erick

On Sat, Jul 31, 2010 at 9:41 AM, Anuj Shah <anujshahwork <at> gmail.com> wrote:

> Hi,
>
> Is there a way to get all the fields involved in a query?
>
> Thanks
>
> Anuj
>
Naama Kraus | 1 Aug 2010 09:43
Picon

Re: Rank results only on some fields

Wouldn't boosting the term AUTHOR:Manning with a boost factor of 0 do the
trick ?
Naama

On Sat, Jul 31, 2010 at 11:04 AM, Philippe <mailer.thomas <at> gmail.com> wrote:

> Hi,
>
> I want to rank my results only on parts of my query.  E.g my query is
> "TITLE:Lucene AND AUTHOR:Manning". After this query standard lucene ranking
> for both fields take place.
>
> However, is it possible to query the index using the full query and rank
> results only according to the "TITLE"-Field?
>
> Regards,
>    Philippe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-user-help <at> lucene.apache.org
>
>
adasal | 1 Aug 2010 17:29
Picon
Gravatar

Semantic Search research?

Does anyone on the list know about the nature of this research or
collaboration:-
A search engine based on '25 years of cutting edge research from the Indian
Institutes of Technology, the University of Delhi, the Massachusetts
Institute of Technology, Harvard University, and the University of
California at Berkeley'?
It uses an algorithms described this way:
'... distributed electronic semantic intelligence (DESI) algorithms and our
word-sense disambiguation extraction technologies can automatically identify
the referential "meanings" embedded in various search terms.'
From this description I can't get a handle on what the technology might be
that would be different to something obvious, that the referential meanings
in search terms would be important in disambiguating them, and using them
can form the basis for grouping search results.
This puts the emphasis on the disambiguation of the search terms which is a
different strategy to clustering on the basis of what topics returned
documents contain.

In short, my question is whether anyone knows of research in this area that
might fit the description in quotes?
I am aware of projects like SireN. Does anyone have any hints on how this
and known approaches may contrast?

Adam Saltiel
Amin Mohammed-Coleman | 1 Aug 2010 21:00
Picon

hit exception flushing segment _0 - IndexWriter configuration

Hi

I am currently building an application whereby there is a remote index server (yes it probably does sound
like Solr :)) and users use my API to send documents to the indexing server for indexing.  The 2 methods
primarily used is add and commit. So the user can send requests for documents to be added to the index and
then can call commit.  I did a test where i simulated a user calling the add method 10 times and then in a
separate method call invoked commit.   The thing I noticed when i turned the verbose setting for the
IndexWriter was:

hit exception flushing segment _0

It may be worth mention the settings I have for my index writer:

mergeFactor ="100" 
maxMergeDocs = "9999999" 

When i use my api to add 102 documents and then in a separate method call invoke a commit I get no exception.  So I
was wondering what is the best setting for the mergeFactor, and should i be experiencing this exception
after requesting a commit after adding 10 documents to the index? 

Any help would be appreciated.

Thanks
Amin
Anuj Shah | 2 Aug 2010 14:16
Picon

Re: Get fields from a Query object

My code has been given a query string, which we parse into the Query object
and would like to get a list of fields from.

I'm assuming there exists a method to do so, as it seems like a useful
function. If not should I be parsing the string for fields myself.

Anuj

On Sun, Aug 1, 2010 at 2:21 AM, Erick Erickson <erickerickson <at> gmail.com>wrote:

> Could you explain more about what you're trying to do? You're writing the
> query
> after all, so you probably already know what went into it.
>
> Which shows that I don't understand what you want to do at all.
>
> Best
> Erick
>
> On Sat, Jul 31, 2010 at 9:41 AM, Anuj Shah <anujshahwork <at> gmail.com> wrote:
>
> > Hi,
> >
> > Is there a way to get all the fields involved in a query?
> >
> > Thanks
> >
> > Anuj
> >
>
(Continue reading)

ArminS | 2 Aug 2010 15:36
Picon

creating tag cloud (with faceted search?) for search result (filter)


Hi guys,

I did some extensive research over the last days, also searched the threads
in this forum (big compliment to the users helping here!) about creating a
tag cloud of the search result(s). But I still couln't find something
satisfying me yet...

Background:

I have lots of user text comments (unstructured) stored in the database,
collected over a website as feedback.
My goal is, first to make this amount of data searchable/filterable with
good performance and then I want to create a tag cloud of the results after
searching for a single term, so that I can use the tag cloud then for
iterative search refinement by clicking on one term (for filtering again) in
the cloud! No discussion about the tag cloud decision please :) Focus is on
single terms/tags mainly, no phrases.

Lets imagine doing a search/filter with the word "performance":

- while showing the results, a tag cloud should be generated with all the
terms connected to those text comments, in which "performance" appears.
- when I click on a word in this tagcloud, it should do the same again for
this selected word etc.
=> have the "corpus" filtered in a visualized way as a tag cloud

Actually it goes in this direction ("Drill Clouds"):
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Ungava

(Continue reading)

Erick Erickson | 2 Aug 2010 17:01
Picon

Re: Get fields from a Query object

Did you look at Query.extractTerms? I think that'll work for you.
Note that the query must be rewritten, and that the set of terms will
have duplicate fields. i.e. if you search field1:Erick +field1:James
I expect you'll have two terms in the set that are on field1.

Best
Erick

On Mon, Aug 2, 2010 at 8:16 AM, Anuj Shah <anujshahwork <at> gmail.com> wrote:

> My code has been given a query string, which we parse into the Query object
> and would like to get a list of fields from.
>
> I'm assuming there exists a method to do so, as it seems like a useful
> function. If not should I be parsing the string for fields myself.
>
> Anuj
>
>
>
>
>
> On Sun, Aug 1, 2010 at 2:21 AM, Erick Erickson <erickerickson <at> gmail.com
> >wrote:
>
> > Could you explain more about what you're trying to do? You're writing the
> > query
> > after all, so you probably already know what went into it.
> >
> > Which shows that I don't understand what you want to do at all.
(Continue reading)

Jason Dixon | 2 Aug 2010 18:03
Favicon

Register now for Surge 2010

Registration for Surge Scalability Conference 2010 is open for all
attendees!  We have an awesome lineup of leaders from across the various
communities that support highly scalable architectures, as well as the
companies that implement them.  Here's a small sampling from our list of
speakers:

John Allspaw, Etsy
Theo Schlossnagle, OmniTI
Rasmus Lerdorf, creator of PHP
Tom Cook, Facebook
Benjamin Black, fast_ip
Artur Bergman, Wikia
Christopher Brown, Opscode
Bryan Cantrill, Joyent
Baron Schwartz, Percona
Paul Querna, Cloudkick

Surge 2010 focuses on real case studies from production environments;
the lessons learned from failure and how to re-engineer your way to a
successful, highly scalable Internet architecture.  The conference takes
place at the Tremont Grand Historic Venue on Sept 30 and Oct 1, 2010 in
Baltimore, MD.  Register now to enjoy the Early Bird discount and
guarantee your seat to this year's event!

http://omniti.com/surge/2010/register

Thanks,

--

-- 
Jason Dixon
(Continue reading)

Fernando Wasylyszyn | 3 Aug 2010 00:30
Picon
Favicon

Modify how a field value is stored in Lucene

Hi all. This is my question. Currently, I'm working in a project where I have 
Lucene documents with one field that use payloads. For this field, I use 
org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter, so the value 
for that field is something like: "fieldValue\1.0" where '\' is the payload 
delimiter and "1.0" is the field value payload. The thing is that the values of 
this field are stored in the index, so when I retrieve the field value from the 
index, I get "fieldValue\1.0" and manually have to remove the payload from the 
field value in order to show it to the user, put it in some XML or whatever. 
There is any way to modify how a field value is stored in the index as it can be 
modified how the value is indexed vía TokenFilters? In this particular case, the 
needed modification is remove the payload from the index stored field value, but 
there are a lot of cases where something like this could be useful. Thanks in 
advance. Cheers.

      
Anuj Shah | 3 Aug 2010 11:02
Picon

Re: Get fields from a Query object

Thanks, that does seem good in theory. I can get the field from each of the
terms and add them to a Set to de-dupe.

However, in practice queries of the following nature seems to fail with an
UnsupportedOperationException:
field:a*
field:[a TO b]

Delving into the code a bit I see the following in the Query class
  /**
   * Expert: adds all terms occurring in this query to the terms set. Only
   * works if this query is in its { <at> link #rewrite rewritten} form.
   *
   *  <at> throws UnsupportedOperationException if this query is not yet
rewritten
   */
  public void extractTerms(Set<Term> terms) {
    // needs to be implemented by query subclasses
    throw new UnsupportedOperationException();
  }

Does this imply that some concrete Query classes have not overridden this
method?

On Mon, Aug 2, 2010 at 4:01 PM, Erick Erickson <erickerickson <at> gmail.com>wrote:

> Did you look at Query.extractTerms? I think that'll work for you.
> Note that the query must be rewritten, and that the set of terms will
> have duplicate fields. i.e. if you search field1:Erick +field1:James
> I expect you'll have two terms in the set that are on field1.
(Continue reading)


Gmane