John Wards | 2 Oct 12:22
Picon
Favicon
Gravatar

Howto use the term iterator?

Hello,

I am scratching my head at how to use the term iterator in PHP.

I would like to know what terms are stored against a document. 

I have a document called $odoc like so coming from some matches.

$i = $matches->begin();
while (!$i->equals($matches->end())) {
   $n = $i->get_rank() + 1;
   $odoc = $i->get_document();
   $data = $odoc->get_data();
   $i->next();
}

How do I use the termiterator returned from:

$odoc->termlist_begin();

I have tried foreaching on it and that doesn't work. I have tried using
reflection to see if I am missing something and found a "next" method.
But again I can't seem to get that to work either.

Some simple example code would be great!

I was only needing this for debugging a document and found "delve" which
does the job, but I don't like to be beaten on this one...

Cheers
(Continue reading)

Favicon

Re: Making SORTAFTER useful in omega?

On 30-9-2008 6:19 Olly Betts wrote:
> 
> So splitting the match set into bands by percentage is rather arbitrary
> to start with.

Agreed, but then again the notion of 'equally relevant matches' is also 
very hard to describe.

> (Interestingly, explicitly reporting a score for each match seems to
> have fallen out of favour, perhaps due to the popularity of Google
> which doesn't provide them.)
> 
> Anecdotally, I was asked about sort bands by a couple of people before
> we removed it, and both times they found it a bit of an odd feature
> when I explained how it worked.
> 
> Unfortunately, we don't really have a replacement for combining
> relevance and sort key rankings in 1.0.x.  The nearest is probably to
> set up the weighting scheme parameters to produce less variation it
> weights (for BM25Weight: k2 = 0 and b = 0; for TradWeight: k = 0).

I tried that, but with 100 results afaik there were only a few documents 
sorted differently when switching from descending to ascending ordering 
when using tradweight with k = 0.

So that's why I came up with the idea to round those scores, to increase 
the change of documents being sorted prior to a similar scoring but 
older document.

> In trunk you can use PostingSource to apply an extra weight to each
(Continue reading)

Olly Betts | 2 Oct 13:08
Favicon
Gravatar

Re: Howto use the term iterator?

On Thu, Oct 02, 2008 at 11:22:18AM +0100, John Wards wrote:
> I have a document called $odoc like so coming from some matches.
> 
> $i = $matches->begin();
> while (!$i->equals($matches->end())) {
>    $n = $i->get_rank() + 1;
>    $odoc = $i->get_document();
>    $data = $odoc->get_data();
>    $i->next();
> }
> 
> How do I use the termiterator returned from:
> 
> $odoc->termlist_begin();

Almost exactly as you're using the MSetIterator $i above, but the method
to read the term is get_term() not get_document().  There's more
information here:

http://xapian.org/docs/bindings/php/

> I have tried foreaching on it and that doesn't work.

It would be nice if this worked, but nobody's written the wrappers
required to implement it yet.

Cheers,
    Olly
John Wards | 2 Oct 14:35
Picon
Favicon
Gravatar

Re: Howto use the term iterator?

On Thu, 2008-10-02 at 12:08 +0100, Olly Betts wrote:
> > How do I use the termiterator returned from:
> >
> > $odoc->termlist_begin();
> Almost exactly as you're using the MSetIterator $i above, but the
> method
> to read the term is get_term() not get_document().  There's more
> information here:

Ah, thanks for that. I'll give it a go.
> 
> > I have tried foreaching on it and that doesn't work.
> 
> It would be nice if this worked, but nobody's written the wrappers
> required to implement it yet.

Hmm sounds like a challenge. There is iterator stuff in spl which might
be helpful.

http://uk.php.net/spl

Cheers
John
John Wards | 3 Oct 12:05
Picon
Favicon
Gravatar

Re: Howto use the term iterator?

On Thu, 2008-10-02 at 13:35 +0100, John Wards wrote:
> > It would be nice if this worked, but nobody's written the wrappers
> > required to implement it yet.
> 
> Hmm sounds like a challenge. There is iterator stuff in spl which
> might
> be helpful.
> 
Right I've been playing with this today.

I have got it looping by doing this:

foreach($odoc->termlist_begin() as $key=>$term){
	echo "{$key}: {$term} <br/>";
}

Key is just integer starting at 0. I am just returning the term for now
as debug. I am planning return the TermIterator, however I have hit
against an issue.

The issue is that foreach needs to know when it hits the end of the
loop. Is there anyway of a TermIterator knowing its the end? I've had a
look at the api and I can't figure it out. I can get the end
TermIterator from the Document but I can't use that if I want to do a
true foreach. I suppose I could pass the end iterator when I call
termlist_begin() but I'd rather use in-build xapian calls.

Cheers
John
(Continue reading)

James Aylett | 3 Oct 13:30

Re: Howto use the term iterator?

On Fri, Oct 03, 2008 at 11:05:44AM +0100, John Wards wrote:

> The issue is that foreach needs to know when it hits the end of the
> loop. Is there anyway of a TermIterator knowing its the end? I've had a
> look at the api and I can't figure it out. I can get the end
> TermIterator from the Document but I can't use that if I want to do a
> true foreach. I suppose I could pass the end iterator when I call
> termlist_begin() but I'd rather use in-build xapian calls.

You have to use termlist_end(), as you've deduced. The Python bindings
(for instance) provide Pythonic iterators by storing the _end() as
well as initialising themselves to the _begin() so they can figure out
when to stop.

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org
Olly Betts | 3 Oct 15:15
Favicon
Gravatar

Re: Making SORTAFTER useful in omega?

On Thu, Oct 02, 2008 at 12:24:24PM +0200, Arjen van der Meijden wrote:
> On 30-9-2008 6:19 Olly Betts wrote:
> > 
> > So splitting the match set into bands by percentage is rather arbitrary
> > to start with.
> 
> Agreed, but then again the notion of 'equally relevant matches' is also 
> very hard to describe.

I don't think that was actually the thinking behind the sort_bands
feature - it always seemed to be used with a small number of bands.
But I guess if you make the bands narrower, it's approximating that
idea.

But I'm not sure that this is a useful way to think about it.  As I see
it, the issue isn't whether these documents are equally relevant or not,
but that there are other factors than the words in the documents and the
user's query which determine how relevant a document is to that user
in response to that query.

Some examples of other factors are document age, information from link
analysis, geographical distance.

> > Unfortunately, we don't really have a replacement for combining
> > relevance and sort key rankings in 1.0.x.  The nearest is probably to
> > set up the weighting scheme parameters to produce less variation it
> > weights (for BM25Weight: k2 = 0 and b = 0; for TradWeight: k = 0).
> 
> I tried that, but with 100 results afaik there were only a few documents 
> sorted differently when switching from descending to ascending ordering 
(Continue reading)

John Wards | 3 Oct 15:29
Picon
Favicon
Gravatar

Re: Howto use the term iterator?

On Fri, 2008-10-03 at 12:30 +0100, James Aylett wrote:
> You have to use termlist_end(), as you've deduced. The Python bindings
> (for instance) provide Pythonic iterators by storing the _end() as
> well as initialising themselves to the _begin() so they can figure out
> when to stop.

Right okay, thats easy enough then.

I might have a go at doing it at the weekend...
Favicon

Re: Making SORTAFTER useful in omega?

On 3-10-2008 15:15, Olly Betts wrote:
> 
>> Do you have plans to tackle the underlying issue anytime soon?
> 
> Hmm, well PostingSource was the plan to tackle the underlying issue...
> 
> Perhaps we have different ideas what the underlying issue is - it seems
> to me to be how to combine query-based weights and sort-by-date.  What
> are you seeing it as?

I our case, I have a few cases where some sort of importance can be 
derived from other variables.

For discussions on our forum, I'd like to return the most relevant 
result first. But if two results appear to be similar in relevance, the 
newest one goes first.
Then again, you could extend it to a more general notion of the fact 
that relevance apparently deteriorates over time, in which case your 
postingsource solution seems to be the right choice.

For a product search we'd like to increase the weight based on what kind 
of product it is. I.e. if someone is searching for 'asus eee pc' he 
probably isn't expecting to see a asus eee pc mouse as the first result.
That was a problem I also was intending to fix with a sorter, but I 
didn't know about your postingsource when I wrote that. The 
PostingSource seems to be a better solution for that, depending on how 
easy it is to use.

Best regards,

(Continue reading)

Olly Betts | 3 Oct 16:11
Favicon
Gravatar

Re: Howto use the term iterator?

On Fri, Oct 03, 2008 at 02:29:35PM +0100, John Wards wrote:
> On Fri, 2008-10-03 at 12:30 +0100, James Aylett wrote:
> > You have to use termlist_end(), as you've deduced. The Python bindings
> > (for instance) provide Pythonic iterators by storing the _end() as
> > well as initialising themselves to the _begin() so they can figure out
> > when to stop.
> 
> Right okay, thats easy enough then.
> 
> I might have a go at doing it at the weekend...

Cool.

If you can work out what the code should look like for one Xapian
iterator class, the others will presumably look very similar, and we can
probably use SWIG to do the boring mechanical work.

Cheers,
    Olly

Gmane