Robert Young | 1 Jan 01:37
Picon
Gravatar

PHP Fatal error while indexing Wikipedia

Hi,

I'm indexing a Wikipedia dump as a way of getting to grips with how
Xapian works but I'm hitting a problem. Indexing fails with the error
pasted below. Although I haven't managed to nail down exactly which
Wikipedia article is causing the error, I am pretty sure it is the
same one each time. I will try to find out exactly which one it is
causing the problem but I was wondering if anyone has come across this
problem before. The only thing I can think it may be is a dodgy
character, is this something which might make Xapian stumble?

PHP Fatal error:  No matching function for overloaded
'TermGenerator_index_text' in /usr/local/lib/php/xapian.php on line
1482

Thanks
Rob
Olly Betts | 1 Jan 11:35
Favicon
Gravatar

Re: PHP Fatal error while indexing Wikipedia

On Tue, Jan 01, 2008 at 12:37:09AM +0000, Robert Young wrote:
> I'm indexing a Wikipedia dump as a way of getting to grips with how
> Xapian works but I'm hitting a problem. Indexing fails with the error
> pasted below. Although I haven't managed to nail down exactly which
> Wikipedia article is causing the error, I am pretty sure it is the
> same one each time. I will try to find out exactly which one it is
> causing the problem but I was wondering if anyone has come across this
> problem before. The only thing I can think it may be is a dodgy
> character, is this something which might make Xapian stumble?

No, Xapian should handle arbitrary data.  The UTF-8 parsing copes with
broken UTF-8 too.

> PHP Fatal error:  No matching function for overloaded
> 'TermGenerator_index_text' in /usr/local/lib/php/xapian.php on line
> 1482

Which xapian-bindings version is this?  Line 1482 doesn't seem to match
up with my tree.

It sounds like either you're passing in parameters with the wrong type,
or it's a bug in the wrappers SWIG is generating, but it's hard to know
which without seeing your indexer code.  The generated code looks OK to
me at least.

My best guess is that maybe you are passing a string for the weight
parameter in some case - as the documentation says:

http://www.xapian.org/docs/bindings/php/

(Continue reading)

Robert Young | 2 Jan 00:50
Picon
Gravatar

Re: PHP Fatal error while indexing Wikipedia

> Which xapian-bindings version is this?  Line 1482 doesn't seem to match
> up with my tree.
1.0.4
The function is
  function index_text($text,$weight=1,$prefix=null) {
    switch (func_num_args()) {
    case 1: case 2:
TermGenerator_index_text($this->_cPtr,$text,$weight); break; // <--
line 1482
    default: TermGenerator_index_text($this->_cPtr,$text,$weight,$prefix);
    }
  }

> It sounds like either you're passing in parameters with the wrong type,
> or it's a bug in the wrappers SWIG is generating, but it's hard to know
> which without seeing your indexer code.  The generated code looks OK to
> me at least.
>
> My best guess is that maybe you are passing a string for the weight
> parameter in some case - as the documentation says:
>
> http://www.xapian.org/docs/bindings/php/
As you can see from the code above, it doesn't look like it can be
getting anything other than 1 or 2 for the weight. Also, about 150,000
documents go in without a problem before this one
in exactly the same way.

>     One thing to be aware of though is that SWIG implements dispatch
>     functions for overloaded methods based on the types of the
>     parameters, so you can't always pass in a string containing a number
(Continue reading)

Olly Betts | 2 Jan 01:15
Favicon
Gravatar

Looking back, and forwards!

This seems an appropriate moment to look back at the past year, and also
forward to the next.

A year ago, we were nearly two months after the release of 0.9.9
(2006-11-09), yet 1.0.0 was still 4.5 months away (2007-05-17).  We did
put out a 0.9.10 release in between (2007-03-04), consisting of 0.9.9
plus backported bug-fixes, but I feel this was still much too long an
interval between releases.

Mostly this was because we pretty much decided upon the features for
1.0.0 and then worked towards them.  I think we need to balance features
against time better in future.  It would have been hard to have picked
out a much reduced subset of the Unicode/UTF-8 related changes, but
these weren't the only changes.

Also, in hindsight, I think we probably merged the UTF-8 branch into the
trunk too soon.  Ideally we want to keep trunk as close as possible to a
state we'd be happy to release - then we can easily decide it's time to
hold back some planned features and make a release.  Developing new
features on branches can help here, though it has its own problems.

Once 1.0.0 was out, we achieved our aim of making a new release about
every 1-2 months (5 releases in about 7.5 months).  These releases
included some exciting new features (e.g. spelling correction, synonyms,
user metadata, OP_SCALE_WEIGHT, more flexible sorting of results), some
big efficiency improvements for various cases, and a good sprinkling of
bug fixes.

The documentation has improved - in particular we now have a series of
"topic" documents to complement the doxygen-collated API documents.
(Continue reading)

Olly Betts | 2 Jan 01:22
Favicon
Gravatar

Re: PHP Fatal error while indexing Wikipedia

On Tue, Jan 01, 2008 at 11:50:35PM +0000, Robert Young wrote:
> > Which xapian-bindings version is this?  Line 1482 doesn't seem to match
> > up with my tree.
> 1.0.4
> The function is
>   function index_text($text,$weight=1,$prefix=null) {
>     switch (func_num_args()) {
>     case 1: case 2:
> TermGenerator_index_text($this->_cPtr,$text,$weight); break; // <--
> line 1482
>     default: TermGenerator_index_text($this->_cPtr,$text,$weight,$prefix);
>     }
>   }
> 
> > It sounds like either you're passing in parameters with the wrong type,
> > or it's a bug in the wrappers SWIG is generating, but it's hard to know
> > which without seeing your indexer code.  The generated code looks OK to
> > me at least.
> >
> > My best guess is that maybe you are passing a string for the weight
> > parameter in some case - as the documentation says:
>
> As you can see from the code above, it doesn't look like it can be
> getting anything other than 1 or 2 for the weight.

I think you must be misreading the code - the switch is on
"func_num_args()", which in PHP returns the number of parameters which
were passed to the current function/method.  So $weight is either 1 or
whatever was passed to the index_text method.

(Continue reading)

Robert Young | 2 Jan 09:04
Picon
Gravatar

Re: PHP Fatal error while indexing Wikipedia

On Jan 2, 2008 12:22 AM, Olly Betts <olly <at> survex.com> wrote:
> On Tue, Jan 01, 2008 at 11:50:35PM +0000, Robert Young wrote:
> > > Which xapian-bindings version is this?  Line 1482 doesn't seem to match
> > > up with my tree.
> > 1.0.4
> > The function is
> >   function index_text($text,$weight=1,$prefix=null) {
> >     switch (func_num_args()) {
> >     case 1: case 2:
> > TermGenerator_index_text($this->_cPtr,$text,$weight); break; // <--
> > line 1482
> >     default: TermGenerator_index_text($this->_cPtr,$text,$weight,$prefix);
> >     }
> >   }
> >
> > > It sounds like either you're passing in parameters with the wrong type,
> > > or it's a bug in the wrappers SWIG is generating, but it's hard to know
> > > which without seeing your indexer code.  The generated code looks OK to
> > > me at least.
> > >
> > > My best guess is that maybe you are passing a string for the weight
> > > parameter in some case - as the documentation says:
> >
> > As you can see from the code above, it doesn't look like it can be
> > getting anything other than 1 or 2 for the weight.
>
> I think you must be misreading the code - the switch is on
> "func_num_args()", which in PHP returns the number of parameters which
> were passed to the current function/method.  So $weight is either 1 or
> whatever was passed to the index_text method.
(Continue reading)

Yannick Warnier | 2 Jan 13:02
Gravatar

Wiki link from doc confusing

Hello,

Happy New Year.

The Xapian documentation page [1] mentions the wiki and provides a link
to it, but the link itself points to a general page of MoinMoin Wiki,
which doesn't help in anything regarding Xapian.

I think it would be far more useful if it was linked to
http://wiki.xapian.org/IndexTitre or any better content-listing page.

Yannick

[1] http://www.xapian.org/docs/
Olly Betts | 2 Jan 14:18
Favicon
Gravatar

Re: Wiki link from doc confusing

On Wed, Jan 02, 2008 at 01:02:51PM +0100, Yannick Warnier wrote:
> The Xapian documentation page [1] mentions the wiki and provides a link
> to it, but the link itself points to a general page of MoinMoin Wiki,
> which doesn't help in anything regarding Xapian.

There are two links to the wiki from that page, both to
http://wiki.xapian.org/ which isn't a general MoinMoin page, but a brief
introduction to what a wiki is, followed by the contents of the wiki.

It sounds like you're seeing something different somehow.

> I think it would be far more useful if it was linked to
> http://wiki.xapian.org/IndexTitre or any better content-listing page.

Hmm, perhaps your browser is set to ask for french pages and MoinMoin is
showing you a different front page.  What wiki page are you getting for
http://wiki.xapian.org/ ?  I guess we need to delete any such alternative
language front pages to avoid this problem.

Cheers,
    Olly
Olly Betts | 2 Jan 14:30
Favicon
Gravatar

Search::Xapian 1.0.5.0 released

I've uploaded Search::Xapian 1.0.5.0 to CPAN.  For your convenience
(especially since files can take a while to propagate to the CPAN
mirrors) I've also uploaded a copy to oligarchy.co.uk - both copies are
linked to from the Xapian download page:

http://www.xapian.org/download.php

The main changes in this release are that various recently added C++
features are now wrapped for Perl, and that the ValueRangeProcessor
subclasses should now work with Perl 5.6 too.  

Cheers,
    Olly
James Aylett | 2 Jan 16:02

Re: Wiki link from doc confusing

On Wed, Jan 02, 2008 at 01:18:14PM +0000, Olly Betts wrote:

> Hmm, perhaps your browser is set to ask for french pages and MoinMoin is
> showing you a different front page.  What wiki page are you getting for
> http://wiki.xapian.org/ ?  I guess we need to delete any such alternative
> language front pages to avoid this problem.

It's worth pointing out that older moinmoin instances (including
wiki.xapian.org) aren't great at how they handle translations. The
more recent ones allow you to override translation for the front page
differently (kind of - it's actually more clever than that) for
precisely this reason.

If we're still planning on moving to trac, this is somewhat
irrelevant. If not, the latest moinmoin is on atreus, so we'll get
this sorted properly when we move the websites over. Either way it'll
work itself out :-)

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org

Gmane