J5 | 30 Sep 00:44
Picon

is EXACT_MATCH working?

Hello,

We are using Xappy to create indexes for package searching in Fedora.
Right now the results are a bit skewed due to freetext searches simply
matching the number of times a term shows up.  I want to fix this
using exact matching on the package name so that if an exact match is
found we return that as the top result.  This does not seem to work.
If I do this and remove all of the other matching fields we always get
an empty result

iconn.add_field_action('exact_name', xappy.FieldActions.INDEX_EXACT)
iconn.add_field_action('exact_name', xappy.FieldActions.STORE_CONTENT)
doc.fields.append(xappy.Field('exact_name', 'dbus',  weight=100.0))
.
.
.

then searching for 'dbus' using xapian should return that match but we
get an empty set:

query = qp.parse_query('dbus')
enquire.set_query(query)
matches = enquire.get_mset(0, 10)
count = matches.get_matches_estimated()
print count

> 0

How do we get count working?  BTW we are using xappy for indexing
because it presents a nice interface but xapian is simple enough on
(Continue reading)

Richard Boulton | 30 Sep 11:22
Gravatar

Re: is EXACT_MATCH working?

On 29 September 2011 23:44, J5 <john.j5.palmieri@...> wrote:
> How do we get count working?  BTW we are using xappy for indexing
> because it presents a nice interface but xapian is simple enough on
> the query side that we decided to use that for stability.

I'm not sure what stability you mean, but ok; it's possible to do
this, but you'll need to understand a bit more about xapian internals.
 I think you'll end up replicating chunks of xappy, so I wouldn't take
this approach, personally.

I think the problem in this case is that the INDEX_EXACT action
doesn't store an unprefixed version of the term.  For an
INDEX_FREETEXT action, the text "dbus" will get indexed both as "dbus"
(for non field-specific searches) and also as something like "XAdbus"
for field specific searches.  (dbus may also be stemmed, depending on
settings).  For an index exact field, you'll just get soemthing like
the "XAdbus" field.

To search this using pure xapian, you'll have to look up what the
prefix to insert is by reading and unpacking the metadata key stored
by xappy which holds this configuration, and give it to the query
parser by calling qp.add_boolean_prefix().

Really, I recommend you use xappy for searches too.

-- 
Richard

--

-- 
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
(Continue reading)

john palmieri | 30 Sep 18:10
Picon

Re: is EXACT_MATCH working?

On Fri, Sep 30, 2011 at 5:22 AM, Richard Boulton <richard@...> wrote:
> On 29 September 2011 23:44, J5 <john.j5.palmieri@...> wrote:
>> How do we get count working?  BTW we are using xappy for indexing
>> because it presents a nice interface but xapian is simple enough on
>> the query side that we decided to use that for stability.
>
> I'm not sure what stability you mean, but ok; it's possible to do
> this, but you'll need to understand a bit more about xapian internals.
>  I think you'll end up replicating chunks of xappy, so I wouldn't take
> this approach, personally.

Thanks for you reply.  I hope I didn't come off as offensive but as it
is the stable version of xappy doesn't quite have everything we need
so I am using the version in svn and even then our requirements are
somewhat different which is always the issue between high level
interfaces and low level capabilities.  Due to the sparse
documentation of xapian itself I do want to know the internals and how
it is doing its matching so that we can tweak it if need be.  Xappy
has given us a great jumping off point in that respect but for
instance I don't need the stored document to be pickled - json would
work better for us as we are simply storing strings, lists and hashes.
 This seems easy to switch out by not marking any fields as
STORE_CONTENT and setting the data in the xapian document before it is
saved to the db.

> I think the problem in this case is that the INDEX_EXACT action
> doesn't store an unprefixed version of the term.  For an
> INDEX_FREETEXT action, the text "dbus" will get indexed both as "dbus"
> (for non field-specific searches) and also as something like "XAdbus"
> for field specific searches.  (dbus may also be stemmed, depending on
(Continue reading)

J5 | 30 Sep 20:11
Picon

Re: is EXACT_MATCH working?

On Sep 30, 5:22 am, Richard Boulton <rich...@...> wrote:
> On 29 September 2011 23:44, J5 <john.j5.palmi...@...> wrote:
>
> > How do we get count working?  BTW we are using xappy for indexing
> > because it presents a nice interface but xapian is simple enough on
> > the query side that we decided to use that for stability.
>
> I'm not sure what stability you mean, but ok; it's possible to do
> this, but you'll need to understand a bit more about xapian internals.
>  I think you'll end up replicating chunks of xappy, so I wouldn't take
> this approach, personally.
>
> I think the problem in this case is that the INDEX_EXACT action
> doesn't store an unprefixed version of the term.  For an
> INDEX_FREETEXT action, the text "dbus" will get indexed both as "dbus"
> (for non field-specific searches) and also as something like "XAdbus"
> for field specific searches.  (dbus may also be stemmed, depending on
> settings).  For an index exact field, you'll just get soemthing like
> the "XAdbus" field.
>
> To search this using pure xapian, you'll have to look up what the
> prefix to insert is by reading and unpacking the metadata key stored
> by xappy which holds this configuration, and give it to the query
> parser by calling qp.add_boolean_prefix().
>
> Really, I recommend you use xappy for searches too.
>
> --
> Richard

(Continue reading)


Gmane